reading – Page 10

Jurafsky & Martin (2nd ed) – Section 8.2 – Phonetic Analysis

Each word in the normalised text needs a pronunciation. Most words will be found in the dictionary, but for the remainder we must predict pronunciation from spelling.

Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation

We need to normalise the input text so that it contains a sequence of pronounceable words.

Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata

An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.

Holmes & Holmes – Chapter 5 – Message synthesis from stored human speech components

Pitch-synchronous overlap-and-add (PSOLA) remains a key technique in speech signal processing.

Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech

Written for a non-technical audience, this gently introduces some key concepts in speech signal processing.

Ladefoged (Elements) – Chapter 11 – Digital filters and LPC analysis

A brave attempt to use ‘long hand’ to spell out how LPC analysis works, but not a recommended reading.

Ladefoged (Elements) – Chapter 10 – Fourier analysis

An attempt to explain Fourier analysis. Although chapters 1-9 are great, I actually do not recommend chapter 10.

Ladefoged (Elements) – Chapter 8 – Resonances of the vocal tract

A more detailed look at how resonance occurs in the vocal tract, and how that can be used to explain the sound quality (e.g. formant frequencies) of vowel sounds.

Ladefoged (Elements) – Chapter 7 – The production of speech

Some phonetics at last, and a first encounter with the source-filter model of speech.

Ladefoged (Elements) – Chapter 6 – Hearing

Some understanding of human hearing will be helpful for engineering suitable features to extract from the waveform for automatic speech recognition.

Ladefoged (Elements) – Chapter 5 – Resonance

Now we turn to speech production, and resonance is how the vocal tract shapes the quality of sound, to produce different vowels, for example.

Ladefoged (Elements) – Chapter 4 – Wave analysis

The waveform is rarely the best representation for analysis of sound, and this is an intuitive introduction to why.