Letter-to-sound: Alignment method for the training set

This topic has 5 replies, 2 voices, and was last updated 9 years, 7 months ago by Simon.

Viewing 5 reply threads

Author

Posts
- October 11, 2015 at 12:53 #283
  Qianchu L
  Student
  I don’t quite understand the alignment method for training set mentioned in p294 (chapter 8.2.3) of Jurafsky & Martin’s textbook.
  
  It seems to me that the training only aligns each letter to its most probable pronunciation in the allowable list without considering history/context. That is, if “c” is mostly realised as “K” in English, “c” will always be aligned to “k” in any words in the training set?
  
  If it is the case, I find it difficult to see how a machine learning classifier could extract other features from this training set since there is already only one-to-one mapping between each letter and its phone in the training set.
- October 12, 2015 at 09:39 #285
  Simon
  Professor
  This algorithm is for preparing the training set for a letter-to-sound model (e.g., a classification tree). The end result of the algorithm is a single alignment between letters and phonemes, for each word in the training set (i.e., a pre-existing pronunciation dictionary).
  
  It’s important to realise that, across the whole training set, a particular letter (e.g., “c”) might align with different phonemes (sometimes /k/, sometimes /ch/, etc) in different words. It won’t necessarily always align with the same phoneme all the time.
  
  So, how do we get to that single alignment? We use a simple unigram model of the probability of each letter aligning with each phoneme. Most of the probabilities in this model will be zero, and the only non-zero probabilities are for those letter-phoneme pairs given in the allowables list.
  
  The key machine learning concept to understand in this algorithm is that of first initialising this unigram model and then iteratively improving the model.
  
  To initialise, and then to improve the model, we need an alignment for all words in the training set, so that we can count how many times each phoneme aligns with each letter. The allowables lists are used to find the first alignment. The model is then updated, and then this improved model is used to find a better alignment.
  
  If the allowables list for a particular letter only contained a single phoneme, then that letter would always have to align with that phoneme. But in general, the allowables lists will have many phonemes for each letter.
- October 13, 2015 at 11:04 #288
  Qianchu L
  Student
  Are the words in the training set already hand labelled with their pronunciation before the algorithm?
  
  If not, how can we find a single good alignment for each word in the training set? If We are to use unigram probabilities, say we count all the possible realisation of “c” in its allowable list (/k/, /s/…) and conclude that P(/k/|”c”) is the highest among the list. With this probability, how are we able to align “c” with /s/ in the case of “cistern”?
- October 13, 2015 at 12:44 #289
  Simon
  Professor
  Yes, the words in the training set are hand-labelled with the pronunciation: this is just a dictionary. See this topic.
  
  At synthesis time, the dictionary will be used in preference to the letter-to-sound model for all words in the dictionary. The letter-to-sound model will only be used for words not in the dictionary.
- October 14, 2015 at 18:04 #319
  Qianchu L
  Student
  Now that the training set is already labelled with the pronunciation, I assume that every letter is already aligned with its correct phone in each word in the training set, so why are we bothered to implement this algorithm to realign each letter with its phone ?
- October 14, 2015 at 18:19 #320
  Simon
  Professor
  The pronunciation dictionary (written by hand) does not specify an alignment between letters and phonemes. See this topic for an extract from cmulex, showing what is contained in the dictionary.
  
  We need to use this algorithm to find the alignment, before going on to train a classification tree.
Author

Posts

Viewing 5 reply threads

You must be logged in to reply to this topic.

Letter-to-sound: Alignment method for the training set

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis