This video just has a plain transcript, not time-aligned to the videoTHIS IS AN UNCORRECTED AUTOMATIC TRANSCRIPT. IT MAY BE CORRECTED LATER IF TIME PERMITSWhat about these other terms here? This P W there's no Oh there.So it doesn't matter if I've given you the observations are not yet weaken.Compute PW separately without ever seeing.In other words, we could do even before we start doing speech recognition.Before we even record the speech or anything, we could compute this p o.W term because we could do it in advance.It's called a prior.That means we know it before we even start doing any of this computation.And it's just a probability off the word or the word sequence.More generally, for example, in a digit, recognise her if all of our words equally likely going to have a pee of W equals one that's good equal, attend w equals two people want 10.So maybe that's just simply uniform distribution.But for a more interesting task, it might be Some words are more likely than other ones.It might not be in uniform distribution.We could see that we could compute that ahead of time.We don't need any speech for that.You get ahead of time.We look at how we do that later, so that's got a bit of an idea about how we might be able to get that it's something to do with the probabilities of words or sequences of words, including collected speech.
Bayes’ rule: P(W)
The term P(W) has just appeared. What is it and how are we going to compute it?
Log in if you want to mark this as completed
|
|