Module 3 – speech synthesis

Text processing, including normalisation of Non-Standard Words.

Readings Tutorial A Tutorial A tutorial B

With only 3 short videos and 1 essential reading, you might be able to explore some of the recommended or extra readings this week. Feel free to pursue areas that interest you, and read either Jurafsky & Martin or Taylor – see which you prefer. Taylor is certainly the authority when it comes to Text-To-Speech, but on the other hand Jurafsky & Martin are experts in NLP.

Reading

Jurafsky & Martin (2nd ed) – Section 8.1 – Text Normalisation

We need to normalise the input text so that it contains a sequence of pronounceable words.

Taylor – Chapter 3 – The text-to-speech problem

Discusses the differences between spoken and written forms of language, and describes the structure of a typical TTS system.

Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging

For our purposes, only sections 5.1 to 5.5 are needed.

Jurafsky & Martin – Chapter 2 – Regular Expressions and Automata

An important technique used widely in NLP. In TTS, it can be applied to tasks such as detecting and expanding non-standard words.

Taylor – Chapter 4 – Text Processing

Complementary to Jurafsky & Martin, Section 8.1.

Taylor – Chapter 5 – Text decoding

Complementary to Jurafsky & Martin, Section 8.1.

Jurafsky & Martin – Section 3.4 – Finite-State Transducers

FST are a powerful and general-purpose mechanism for mapping ("transducing") an input string to an output string.

This is a SIGNALS tutorial to consolidate the material from Modules 1 and 2.

To prepare for the tutorial, go back over the Jupyter notebooks for those modules. Agree with your tutorial group a list of points that you would like to go over with your tutor. Carefully order your list (by topic and priority) and bring it to the tutorial.

Here are the key points that you need to understand from each notebook, so concentrate on these when writing your list:

signals/slp-m1-1-sounds-signals – periodicity and pitch
signals/slp-m1-2-digital-signals-complex-numbers – a phasor is a sinusoid with both magnitude and phase
signals/slp-m1-3-sampling-sinusoids – sampling, Nyquist frequency, aliasing
signals/slp-m1-4-discrete-fourier-transform – the DFT decomposes any signal into a series of basis functions; each basis function is a phasor (i.e., a sinusoid with magnitude and phase)
signals/slp-m1-5-interpreting-the-dft – relating what you see in the time domain to what you see in the frequency domain
signals/sp-m2-1-impulse-as-source – an impulse train has energy at every multiple of its fundamental frequency
signals/sp-m2-2-fir-filters – FIR filters are little more than a moving average; an intuitive understanding that changing the filter coefficients changes the frequency response
signals/sp-m2-3-iir-filters – IIR filters can exhibit resonance; the filter coefficients are not very intuitive; an IIR filter can impose a spectral envelope with resonances (formants) on its input signal; exciting an IIR filter with an impulse train can synthesise speech

Eventually, you may be able to understand a lot more of the material in the notebooks (so come back to them in a few weeks and try again), but the above is quite an achievement and is all you really need for the course.

This is a PHON tutorial about the phoneme. Go through the following Jupyter Notebook:

phon/phon-m4-1-phoneme-tutorial

You might need to update your copy of the notebooks to get the latest version.

Exercises

Go through the following Jupyter Notebooks, which cover the topics Prosody, Decision Tree, and Learning Decision Trees:

tts/tts-m4-1-entropy – do this one on your own then check your understanding by explaining entropy to another student in your group
tts/tts-m4-2-decision-tree-pencil-and-paper – do this one in small groups of 2 or 3 students
tts/tts-m4-3-learning-decision-trees – do this one on your own (but if the code is challenging for you, pair up with someone who can code)

You might need to update your copy of the notebooks to get the latest version. You may also need to install the following dependencies if you don’t already have them:

$ cd uoe_speech_processing_course
$ conda activate slp
$ conda install -c conda-forge anytree urllib3 requests

Practical assignment

Continue the first assignment. Complete all the milestones to date. Use the forums to get help.

Prepare for the tutorial session

In your small group of 2 or 3 students, prepare your workings for the tt/tts-m4-2-decision-tree-pencil-and-paper notebook and be ready to share them with the group. You might need to scan or take photos of pencil-and-paper work, so do that in advance of the tutorial. Prepare questions about the other notebooks with the whole tutorial group and make a structured list of questions to go through with the tutor.

Module 3 – speech synthesis – front end 1