Tokenisation and normalisation

The first task is to convert all of the text into words.
10 minutes 35 seconds

Further details

Reading

Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation

We need to normalise the input text so that it contains a sequence of pronounceable words.

Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging

For our purposes, only sections 5.1 to 5.5 are needed.

Taylor – Chapter 4 – Text Processing

Complementary to Jurafsky & Martin, Section 8.1.

Taylor – Chapter 5 – Text decoding

Complementary to Jurafsky & Martin, Section 8.1.

Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata

An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.

Excellent 0
Very helpful 8
Quite helpful 9
Slightly helpful 0
Confusing 0
No rating 0
My brain hurts 0
Really quite difficult 0
Getting harder 0
Just right 11
Pretty simple 6
No rating 0