This video just has a plain transcript, not time-aligned to the videoTHIS IS AN AUTOMATIC TRANSCRIPT WITH LIGHT CORRECTIONS TO THE TECHNICAL TERMS we're going to work our way upWe're going to build these whole word templateswe're going to start with analysing speech in frames we're going to extract something from individual frameswe're going to build up a sequence of feature vectors for each word that we like to recognise and compare it to some stored ones.So this is the scenario then, in training the system, there's no statistical model yetIn training the system, we think of all the words would like to recognise and we record one example of each and we save it with its label so we know what it is.Let's pretend they're the digits.0 1 2 3 ... to 9record each of them once, store it in a file, put a label next to it.So we remember what they are called.These references often known as templates.They're just gonna be single examplarsAnd then at recognition time, we have a recording of a wordwe know it starts and where it ends, we're gonna match it against each of the references in turnwe're going to measure the distance to that reference, and then we're going to look at all the distances.Pick the smallest one and announce that label as the label for the unknown word.So it's extremely simple form of pattern matching it's not statistical.It's just based on exemplars.So this finding the closest match between an unknown thing and various known things is the key process.So we're going to a measure of distance between one recorded word and another recorded word, and it's these features that we're going to use to measure this distance: this difference.
Whole word templates
Our first automatic speech recogniser stores an example ("template") of each word. Speech to be recognised is compared against each template.
Log in if you want to mark this as completed
|
|