Kominek & Black: CMU ARCTIC databases for speech synthesis

Widely used, copyright-free speech databases for use in speech synthesis

Taylor – Section 17.2 – Evaluation

Testing of the system by the developers, as well as via listening tests.

Taylor – Section 17.1 – Databases

Including the important issue of labelling the data

Hunt & Black: Unit selection in a concatenative speech synthesis system using a large speech database

The classic description of unit selection, described as a search through a network.

Clark et al: Multisyn: Open-domain unit selection for the Festival speech synthesis system

A description of the implementation and evaluation of Festival’s unit selection engine, called Multisyn.

Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis

A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).

Taylor – Chapter 16 – Unit-selection synthesis

A substantial chapter covering target cost, join cost and search.

Clark et al: Festival 2 – build your own general purpose unit selection speech synthesiser

Discusses some of the design choices made when writing Festival’s unit selection engine (Multisyn) and the tools for building new voices.

Furui et al: Fundamental Technologies in Modern Speech Recognition

A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.

Holmes & Holmes – Chapter 9 – Stochastic Modelling

May be helpful as a complement to the essential readings.

Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance

We mitigate the over-simplifications of the model using ever-more-complex algorithms.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.