Cho & Ladefoged – Variation and universals in VOT: evidence from 18 languages

Voice onset time (VOT) is known to vary with place of articulation.

Furui et al: Fundamental Technologies in Modern Speech Recognition

A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.

Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance

We mitigate the over-simplifications of the model using ever-more-complex algorithms.

Holmes & Holmes – Chapter 6 – Phonetic Synthesis by Rule

Mainly of historical interest.

Holmes & Holmes – Chapter 9 – Stochastic Modelling

May be helpful as a complement to the essential readings.

Johnson (Phonetics) – Chapter 2 – The Acoustic Theory of Speech Production: Deriving Schwa

Derives the acoustic features of the vocal tract in terms of the source-filter model

Johnson (Phonetics) – Chapter 6.1 – Tube models of vowel production

Deriving the resonances and formant structures of vowels using 2 and 3 tube models of the vocal tract.

Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata

An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.

Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging

For our purposes, only sections 5.1 to 5.5 are needed.

Jurafsky & Martin – Section 3.1 – English Morphology

In speech technology for English, little or no use is made of morphology. But for other languages, it is essential.

Jurafsky & Martin – Section 3.2 – Finite-State Morphological Parsing

Automatic morphological decomposition of written words is possible. However, this section does not consider the added complication of deriving a pronunciation.

Jurafsky & Martin – Section 3.3 – Construction of a Finite-State Lexicon

A lexicon can be representing using different data structures (finite state network, tree, lookup table,…), depending on the application.