The Viterbi criterion

At the core of the Viterbi algorithm is this simple dynamic programming operation: discard all paths that cannot win.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
Let's just state the Viterbi criterion for H Mountains and compare it to this picture here and see exactly that.
They're the same thing, and then we'll just work it through with Sergeant Toka passing in the hidden Markov model.
We know the hidden means that there's more than one sequence through the model that could generate any particular observation sequence.
That's the hidden things we don't really know which one.
The model talk we're imagining that speaks signal we recognise was actually generated by an hmm.
It wasn't it was donated by a person, but we're pretending that's our model.
ATT That point of generation, we don't really know what states he was.
The model took inside a black box basins.
We just have to say, Well, he just took all possible state sequences, all with different probabilities, were just some of them together.
And then we've said something together is a big object.
Expensive.
Well, just think about the single best one is a proxy approximation for that.
That's the hidden part.
The Markoff part means the model has no memory, and it's the fact that we're going to assume we're gonna make this assumption.
That's not really true, but the probability of our big observation sequence is just the probability off the first observation times the probability of the second observation on, on and on.
There's no conditional probabilities in there.
We're saying that the probability of the first observation does not depend on the second or the third observations.
Statistically condition independent.
Knowing one is not informative about the other one.
It's clearly untrue.
I show you some little bit of Spectra government ask you to guess the spectrum either side.
It definitely helps you guess what's happening just before and just after.
It's not random.
It's conditional.
Independence, however, we're gonna make this rather strong assumption.
And that's because we so in love with Tomoko.
I have always algorithms.
This is about assumption, but it's really hard to find a model that under the assumption and works better market that independence is the mark off part of hidden Markov model.
You know what's hidden his mark for Markoff? Just imagine saying memory less, and what that means is that as we go through the model, so here's a bit of model.
Okay, we take some little walkthrough, the model to generate so generations like this Okay, here we are.
We'll take a little walk around them, walk through the model.
So off we go into state one.
When we arrive in a state, we emit an observation vector according to the galaxy in in that state.
And then we make a random choice of going around the cell transition or to the next state.
So there are numbers on these.
So we toss a biassed coin who's probabilities are perhaps the probabilities on these arcs.
Maybe we go around here and then we met the next observation.
But the probability of admitting that next observation only depends on the fact that we in that state we've forgotten even how we got there.
Memories White like a goldfish around the bowl.
Everything is new again.
Okay.
Very strong assumption, but greatly simplifies the maths in the algorithm, both learning the model for decoding speech with.
And what that means, then, is as we explore the path through this model, we can explore them in parallel.
So back to this diagram here, or they saw the projector.
See these two puffs A and B.
We're doing parallel exploration of all the possible paths.
Right? Imagine we're trying to find a short way from here to the lab as a class on what we're going to do is just all set off together.
We'll take different routes, right? And then we'll occasionally bump into each other and compare notes.
Whoever was taken longer just retired from the race.
Eventually, one person arrives at the lab, perhaps one or two other people, and they compare notes and this one winner on every time two paths bump into each other.
We only need to keep the best one of that point because we know that the future is independent of the past, given the present that's dying off a dynamic time warping that showing that president means that point in the grid, the future means all the different ways of getting to the end of the past means how we got there.
Weaken stayed exactly the same thing for the hidden Markov model.
The present is the state state is everything.
The state, in fact, is all we need to know in order to generate an observation.
So I don't need to know when I'm generating this observation here.
All I needed to say rhyme in the state, Give me the galaxy in that belongs to the state will generate an observation from the galaxy of all we need to know.
Do not need to remember how we got there.
So we don't remember what we did in the previous times that we have no idea what we're going to do.
The future.
So the present is the state.
The future is the transition we're gonna take next.
The past is where we came from.
State, the preceding state or the preceding state sequence.
And none of those things matter.
The future in the past Don't matter.
Only the present matters.
That's the mark of property.
Saying you just need to know what state you're in, That's all.
I don't remember anything else.
So imagine we're doing this parallel search through Hidden Markov model.
We'll see it in a minute.
On one way of arriving at this state at this particular time is this way.
On the other way is that we happen to be in this state earlier when we got there through this transition two ways arriving in the state, there's gonna have different probabilities because different ways of aligning the observations with the model before we even generate the next observation.
We could just compare those two paths and say, Look, sorry, but I could get here with a better probably than you can.
You can just retired from the race now.
Okay, that's the Viterbi criterion, and it's just a dynamic programming criterion that says you can eliminate paths as soon as possible.
You already know there's no way they can win.