DCTTS is comparable to Tacotron, but is faster because it uses non-recurrent architectures for the encoder and decoder.
Wu et al. Merlin: An Open Source Neural Network Speech Synthesis System
Merlin is a toolkit for building Deep Neural Network models for statistical parametric speech synthesis. It is a typical frame-by-frame approach, pre-dating sequence-to-sequence models.
Ren et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2 improves over FastSpeech by not requiring a complicated teacher-student training regime, but instead being trained directly on the data. It is very similar to FastPitch 2, which was released around the same by different authors.
Łańcucki. FastPitch: Parallel Text-to-speech with Pitch Prediction
Very similar to FastSpeech2, FastPitch has the advantage of an official Open Source implementation by the author (at NVIDIA).
King et al: Speech synthesis using non-uniform units in the Verbmobil project
Of purely historical interest, this is an example of a system using a heterogeneous unit type inventory, developed shortly before Hunt & Black published their influential paper.
Jurafsky & Martin (3rd Ed) – Hidden Markov models
An overview of Hidden Markov Models, the Viterbi algorithm, and the Baum-Welch algorithm
Wayland (Phonetics) – Chapter 9 – Hearing
Introduces basic concepts in human hearing – it may be useful to read the bits on decibels/loudness and the Mel and Bark scales.
Wayland (Phonetics) – Chapter 5 – Phonemic and Morphophonemic Analysis
An introduction to the concept of phonemes, allophones and some common phonological alternations.
Plag (2003) – Word formation in English: Chapter 1 Basic Concepts
An introductory text of word structure/morphology in English. Useful to read if you come from a non-linguistic background.
Johnson (Phonetics) – Chapter 6.1 – Tube models of vowel production
Deriving the resonances and formant structures of vowels using 2 and 3 tube models of the vocal tract.
Wayland (Phonetics) – Chapter 8 – Acoustic Properties of Vowels and Consonants
An overview of the properties of vowels and consonants
Johnson (Phonetics) – Chapter 2 – The Acoustic Theory of Speech Production: Deriving Schwa
Derives the acoustic features of the vocal tract in terms of the source-filter model