The first task is to convert all of the text into words.
10 minutes 35 seconds
Further details
Deciding whether a token is a word
We need a general method for deciding whether a token needs normalising, or can be passed directly to the letter-to-sound module.
Reading
Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation
We need to normalise the input text so that it contains a sequence of pronounceable words.
Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging
For our purposes, only sections 5.1 to 5.5 are needed.
Taylor – Chapter 4 – Text Processing
Complementary to Jurafsky & Martin, Section 8.1.
Taylor – Chapter 5 – Text decoding
Complementary to Jurafsky & Martin, Section 8.1.
Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata
An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.