Summary – training

The Baum-Welch algorithm can be applied to data comprising whole utterances transcribed with words, without needing any manual within-utterance alignment.

slownormalfast

This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITS
Let's just reiterate that this also applies in training.
So for training, the algorithm might apply might be bound Welsh algorithm rather Viterbi algorithm.
We have sentences.
Let's assume our training data is partitioned into sentences with silence.
The beginning.
In the end, we have word transcriptions, no alignment.
We just know these words were in the seventies.
The sentence model, then, is just the transcription.
So it's like a language model says it's only this sentence.
So imagine we have, you know, training sentence, which is labelled just with a string of words with no alignments.
So we'll use that.
We'll look up the phone names for each of these.
Well, look up the hmm states for those phoney names.
Hierarchy.
Generally, models construct temporarily the hmm.
Off the sentence, the cat sat and then we'll have our observation sequence for that training sentence.
Given that, Hmm.
In that observation sequence, we confined the alignment with Viterbi or with bound Welsh.
Anything we like updated model pragmatists and then just go around around around the only little trickiness there, which isn't really very difficult is that in our training data, we won't just train on one sentence, will train on a whole bunch of sentences.
We have 1000 sentences, so this alignment step will run and bound.
Welsh.
That's the e step.
And the E Step just needs to store some information.
It won't actually update the model parameters.
What it will store is the probability of being in the state that each particular time and therefore effectively, which observations aligned with that states in a probabilistic sense and in fact, what it really will store.
It will be the weighted sum of those observations waited by the probability.
So remember, we're doing averaging, so that involves awaited summit with a weighted sum.
And then some of the weights, which will need later for normalisation, will store.
These two things will start up in something called accumulators.
Just buckets will throw throw numbers into, and we're just accumulate during this Easter without changing the model parameters.
And we just do that.
All of the training sentences, hundreds or thousands off them, accumulating all of this stuff.
And at the very end, each state we're having accumulated that says these alignments, these air, the observations that aligned with me with these weights, and this is the sum of these weights.
And then we'll perform the second step, which is the maximisation step, which essentially just devise one by the other and updates to me.
Does the equivalent computation for that VarianMS So the East up obviously, is going to be the thing that takes all the time.
The step is computing all of the alignments across all of the data on the very, very end each state says across all of those thousands of sentences I aligned with these states with these weights or some of them up with those weights computing me, getting them am step maximisation step.