Holmes & Holmes – Chapter 11 – Improving Speech Recognition Performance

We mitigate the over-simplifications of the model using ever-more-complex algorithms.

Jurafsky & Martin – Section 4.4 – Perplexity

It is possible to evaluate how good an N-gram model is without integrating it into an automatic speech recognition. We simply measure how well it predicts some unseen test data.

Taylor – Section 12.3 – The cepstrum

By using the logarithm to convert a multiplication into a sum, the cepstrum separates the source and filter components of speech.

Taylor – Chapter 5 – Text decoding

Complementary to Jurafsky & Martin, Section 8.1.

Taylor – Chapter 6 – Prosody prediction from text

Predicting phrasing, prominence, intonation and tune, from text input.

Taylor – Section 10.2 – Digital signals

Going digital involves approximations in the way an original analogue signal is represented.

Taylor – Section 10.1 – Analogue signals

It’s easier to start by understanding physical signals – which are analogue – before we then approximate them digitally.

Holmes & Holmes – Chapter 6 – Phonetic Synthesis by Rule

Mainly of historical interest.

Jurafsky & Martin – Chapter 5 – Part-of-Speech Tagging

For our purposes, only sections 5.1 to 5.5 are needed.

Jurafsky & Martin – Section 3.5 – FSTs for Morphological Parsing

in Dan Jurafsky and James H. Martin “Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition”, 2009, Pearson Prentice Hall, Upper Saddle River, N.J., Second edition, ISBN 0135041961 Forum for discussing this reading

Jurafsky & Martin – Section 3.3 – Construction of a Finite-State Lexicon

A lexicon can be representing using different data structures (finite state network, tree, lookup table,…), depending on the application.

Jurafsky & Martin – Section 3.4 – Finite-State Transducers

FST are a powerful and general-purpose mechanism for mapping (“transducing”) an input string to an output string.