Module status: complete.
This modules moves away temporarily from speech signals, and is about the text-processing required for Text-To-Speech (TTS). This part of a TTS system is often called the ‘front end’.
Here’s what you’re going to learn in the videos:
Total video to watch in this section: 23 minutes
There are some extra readings on signal processing in this module, if you’d like to revisit that material from another author’s perspective. There is actually a large amount of further signal processing material in Taylor’s book, which is worth exploring.
Reading
Jurafsky & Martin – Section 8.4 – Diphone Waveform Synthesis
A simple way to generate a waveform is by concatenating speech units from a pre-recorded database. The database contains one recording of each required speech unit.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Holmes & Holmes – Chapter 5 – Message synthesis from stored human speech components
Pitch-synchronous overlap-and-add (PSOLA) remains a key technique in speech signal processing.
Taylor – Section 10.1 – Analogue signals
It's easier to start by understanding physical signals - which are analogue - before we then approximate them digitally.
Taylor – Section 10.2 – Digital signals
Going digital involves approximations in the way an original analogue signal is represented.
Taylor – Section 12.7 – Pitch and epoch detection
Only an outline of the main approaches, with little technical detail. Useful as a summary of why these tasks are harder than you might think.
Holmes & Holmes – Chapter 6 – Phonetic Synthesis by Rule
Mainly of historical interest.
Ladefoged (Elements) – Chapter 10 – Fourier analysis
An attempt to explain Fourier analysis. Although chapters 1-9 are great, I actually do not recommend chapter 10.
Ladefoged (Elements) – Chapter 11 – Digital filters and LPC analysis
A brave attempt to use 'long hand' to spell out how LPC analysis works, but not a recommended reading.
This is a SKILLS tutorial about writing up the first assignment.
Prepare for the tutorial session
There will be an exercise for which you need to bring a sample of your writing (150-200 words). Prepare this is a Word document (even if you’re writing up in LaTeX or another tool) named with your name and student number (e.g., s1234567_Jane.doc) This should be part of the draft of your Speech Processing coursework. Be ready to share it only with the tutor at the start of the session (you can send it via a direct message in Teams).
After the tutorial session
Each tutorial group will pool all their writing samples. Each student will select a writing sample from another student in the group. As far as possible, choose one written by someone who speaks a different first language to you. For example, native speakers should select one written by a non-native, a speaker of German might choose one written by a speaker of Chinese, and so on.
Below the original text that you receive, write a version that is shorter and clearer. Return it to the author.
Practical assignment
Continue the first assignment. Complete all the milestones to date. Use the forums to get help.