J. N. Holmes and Wendy Holmes “Speech synthesis and recognition”, 2002, Taylor and Francis, London, Second edition, ISBN 0748408568, 0748408576
Holmes & Holmes - Chapter 5 - Message synthesis from stored human speech components
Pitch-synchronous overlap-and-add (PSOLA) remains a key technique in speech signal processing.
Holmes & Holmes - Chapter 8 - Template matching and dynamic time warping
Read up to the end of 8.5 carefully. Try to read 8.6 as part of Module 7, but rest assured we will go over the concept of dynamic programming again in Module 9. We recommend you should skim 8.7 and 8.8 because the same general concepts carry forward into Hidden Markov Models (again, we'll come back to this in Module 9). You don't need to read 8.9 onwards. Methods like DTW are rarely used now in state of the art systems, but are a good way to start understanding some core ideas.
Holmes & Holmes - Chapter 9 - Stochastic Modelling
May be helpful as a complement to the essential readings.
Holmes & Holmes - Chapter 10 - Front-end analysis for ASR
Covers filterbank, MFCC features. The material on linear prediction is out of scope.
Holmes & Holmes - Chapter 11 - Improving Speech Recognition Performance
We mitigate the over-simplifications of the model using ever-more-complex algorithms.