Bennett: Large Scale Evaluation of Corpus-based Synthesisers

An analysis of the first Blizzard Challenge, which is an evaluation of speech synthesisers using a common database.

Benoît et al: The SUS test

A method for evaluating the intelligibility of synthetic speech, which avoids the ceiling effect.

Clark et al: Festival 2 – build your own general purpose unit selection speech synthesiser

Discusses some of the design choices made when writing Festival’s unit selection engine (Multisyn) and the tools for building new voices.

Clark et al: Multisyn: Open-domain unit selection for the Festival speech synthesis system

A description of the implementation and evaluation of Festival’s unit selection engine, called Multisyn.

Clark et al: Statistical analysis of the Blizzard Challenge 2007 listening test results

Explains the types of statistical tests that are employed in the Blizzard Challenge. These are deliberately quite conservative. For example, MOS data is correctly treated as ordinal. Also includes a Multi-Dimensional Scaling (MDS) section that is not as widely used as the other types of analysis.

Fitt & Isard: Synthesis of regional English using a keyword lexicon

An extension and practical application of Wells’ keyvowels idea, which enables efficient generation of a pronunciation dictionary tailored to a specific accent or speaker.

Gurney: An introduction to neural networks

Somewhat old, but might be helpful in getting some of the basic concepts clear, if you find Nielsen’s “Neural Networks and Deep Learning” too difficult to start with.

Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech

Written for a non-technical audience, this gently introduces some key concepts in speech signal processing.

Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech (Sections 1-5)

Written for a non-technical audience, this gently introduces some key concepts in speech signal processing. Read sections 1-5 (up to and including ‘Fourier Analysis’).

Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech (Sections 6-7)

Written for a non-technical audience, this gently introduces some key concepts in speech signal processing. Read sections 6-7.

Hunt & Black: Unit selection in a concatenative speech synthesis system using a large speech database

The classic description of unit selection, described as a search through a network.

Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis

A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).