Holmes & Holmes – Chapter 6 – Phonetic Synthesis by Rule

Mainly of historical interest.

Taylor – Chapter 3 – The text-to-speech problem

Discusses the differences between spoken and written forms of language, and describes the structure of a typical TTS system.

Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging

For our purposes, only sections 5.1 to 5.5 are needed.

Jurafsky & Martin – Section 3.5 – FSTs for Morphological Parsing

in Dan Jurafsky and James H. Martin “Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition”, 2009, Pearson Prentice Hall, Upper Saddle River, N.J., Second edition, ISBN 0135041961 Forum for discussing this reading

Jurafsky & Martin – Section 3.3 – Construction of a Finite-State Lexicon

A lexicon can be representing using different data structures (finite state network, tree, lookup table,…), depending on the application.

Jurafsky & Martin – Section 3.4 – Finite-State Transducers

FST are a powerful and general-purpose mechanism for mapping (“transducing”) an input string to an output string.

Jurafsky & Martin – Section 3.2 – Finite-State Morphological Parsing

Automatic morphological decomposition of written words is possible. However, this section does not consider the added complication of deriving a pronunciation.

Taylor – Chapter 8 – Pronunciation

Including how the lexicon is stored, letter-to-sound, and compressing the lexicon.

Jurafsky & Martin – Section 3.1 – English Morphology

In speech technology for English, little or no use is made of morphology. But for other languages, it is essential.

Taylor – Chapter 4 – Text Processing

Complementary to Jurafsky & Martin, Section 8.1.

Jurafsky & Martin (2nd ed) – Section 8.2 – Phonetic Analysis

Each word in the normalised text needs a pronunciation. Most words will be found in the dictionary, but for the remainder we must predict pronunciation from spelling.

Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation

We need to normalise the input text so that it contains a sequence of pronounceable words.