Voice onset time (VOT) is known to vary with place of articulation.
Furui et al: Fundamental Technologies in Modern Speech Recognition
A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.
Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance
We mitigate the over-simplifications of the model using ever-more-complex algorithms.
Holmes & Holmes – Chapter 6 – Phonetic Synthesis by Rule
Mainly of historical interest.
Holmes & Holmes – Chapter 9 – Stochastic Modelling
May be helpful as a complement to the essential readings.
Johnson (Phonetics) – Chapter 2 – The Acoustic Theory of Speech Production: Deriving Schwa
Derives the acoustic features of the vocal tract in terms of the source-filter model
Johnson (Phonetics) – Chapter 6.1 – Tube models of vowel production
Deriving the resonances and formant structures of vowels using 2 and 3 tube models of the vocal tract.
Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata
An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.
Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging
For our purposes, only sections 5.1 to 5.5 are needed.
Jurafsky & Martin – Section 3.1 – English Morphology
In speech technology for English, little or no use is made of morphology. But for other languages, it is essential.
Jurafsky & Martin – Section 3.2 – Finite-State Morphological Parsing
Automatic morphological decomposition of written words is possible. However, this section does not consider the added complication of deriving a pronunciation.
Jurafsky & Martin – Section 3.3 – Construction of a Finite-State Lexicon
A lexicon can be representing using different data structures (finite state network, tree, lookup table,…), depending on the application.