The classic description of unit selection, described as a search through a network.
Clark et al: Multisyn: Open-domain unit selection for the Festival speech synthesis system
A description of the implementation and evaluation of Festival’s unit selection engine, called Multisyn.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Taylor – Chapter 16 – Unit-selection synthesis
A substantial chapter covering target cost, join cost and search.
Clark et al: Festival 2 – build your own general purpose unit selection speech synthesiser
Discusses some of the design choices made when writing Festival’s unit selection engine (Multisyn) and the tools for building new voices.
Furui et al: Fundamental Technologies in Modern Speech Recognition
A complete issue of IEEE Signal Processing Magazine. Although a few years old, this is still a very useful survey of current techniques.
Holmes & Holmes – Chapter 9 – Stochastic Modelling
May be helpful as a complement to the essential readings.
Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance
We mitigate the over-simplifications of the model using ever-more-complex algorithms.
Jurafsky & Martin – Section 4.4 – Perplexity
It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.
Jurafsky & Martin – Section 4.3 – Training and Test Sets
As we should already know: in machine learning it is essential to evaluate a model on data that it was not learned from.
Jurafsky & Martin – Section 4.2 – Simple (Unsmoothed) N-Grams
We can just use raw counts to estimate probabilities directly.
Jurafsky & Martin – Section 4.1 – Word Counting in Corpora
The frequency of occurrence of each N-gram in a training corpus is used to estimate its probability.