P(W) in connected speech

This topic has 1 reply, 2 voices, and was last updated 4 years, 5 months ago by Simon.

Viewing 1 reply thread

Author

Posts
- December 11, 2020 at 13:02 #13535
  Isobel W
  Student
  I think I understand p(W) for isolated digits but I am a bit confused about P(W) with connected speech – this is a prior probability without any observations having been made so how do we know that for example someone saying one one one one one has a low probability whereas someone saying three six nine has a high probability? I might just be getting P(O|W) and P(W) confused.
  Also for connected digits or any other specific ASR application, am I right in thinking the dictionary defines allowable words ?(i.e. word sequences we would use n grams on)
- December 11, 2020 at 18:37 #13537
  Simon
  Professor
  In the simple grammar used in the assignment, we assume there is an equal (= uniform) probability of each digit. For the sequences part of the assignment, we also assume all sequences have equal probability.
  
  But in the more general case of connected speech recognition, we will learn the prior P(W) from data. Usually that involves learning (= training) an N-gram language model from a corpus of text: the details of learning an N-gram are out-of-scope for Speech Processing, but you do need to understand that such a model is finite state and what a trained model looks like.
  
  So, the answer to “how do we know the probability of a word sequence before we observe any acoustic evidence (= speech) ?” is that we pre-calculate and store it: that’s the language model. In the general case of an N-gram, we use data to estimate the probability of every possible N-gram in the language by counting its frequencies in a text corpus.
  
  Our prior belief about W is P(W). When we receive the acoustic evidence O, we compute the likelihood P(O|W). We then revise (= update) our belief about W in the light of this new evidence, by multiplying the likelihood and the prior, to get P(W|O). [Ignoring P(O).]
  
  P(W|O) is the posterior: it’s what we believe about the distribution of W given (= after receiving) the acoustic evidence O.
Author

Posts

Viewing 1 reply thread

You must be logged in to reply to this topic.

P(W) in connected speech

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis