Relating DTW to the HMM

The template in DTW is now replaced by a model, but otherwise the methods are conceptually very similar.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
remember this grid and then can't draw on the screen well in this grid.
Remember, we were lining to actual recorded examples of a word.
We called one a template because we made it in advance and labelled it.
And the other one is the unknown words trying to recognise Andi.
We had to line them up and then measure local differences between we reduce the problem to one of alignment stretching on of local distance computation, and we saw that there's lots and lots of different ways of aligning them when we have to search for the one that was the best, the line gave the lowest total distance.
Exactly.
The same thing is going to happen in our hidden Markov model.
But now, instead of a template, we have a model.
We're now going to align the model with this unknown sequence off observation factors.
So this diagram is also a way of thinking about hidden Markov models.
And if you read the older textbooks, if you particularly read homes and homes, you see pictures like this for doing recognition with hidden Markov models, and it's fine for one model in one template.
When you start joining models together, you start stacking these grids on top of each other, and it gets pretty very pretty quickly.
If you're doing this in the 19 eighties, that was how you were thinking about search connected speech recognition.
We're gonna do it in a much neater, clever way with a nice paradigm that's in one of the readings.
It's called token passing.
So the job then was want to compute probability of oh, given W.
So we're gonna get a model, will be trained in the next lecture.
Given that model, compute the probability that it generated this observation sequence.
Now the correct way of doing that, by definition, is to add up all the different ways the model could generate it all the different state sequences, Adam there probabilities and do the total probability that will be expensive because that would be like trying every path in this great.
Now we could do it in an efficient way, using down programming, but it's still gonna take a bit of time to do that.
So what we going to do is we're going to approximate that and, it turns out empirically.
In other words, by experiment.
This approximation is a very very good one.
Instead of computing.
Pierrot given W.
By summing together all the ways that the model could generate observations were going fine the single best way, in other words, the single path.
That's the single most likely way the model could generate.
The observations.
We're going to use that to stand in for the sun, and it's gonna work just as well, because when one is the biggest, theatre will be also the biggest.
That's an empirical results, so most the probability is going to be on the single most likely ones.
We don't need to compute all the other ones.