An analysis of the first Blizzard Challenge, which is an evaluation of speech synthesisers using a common database.
Hunt & Black: Unit selection in a concatenative speech synthesis system using a large speech database
The classic description of unit selection, described as a search through a network.
King: A beginners’ guide to statistical parametric speech synthesis
A deliberately gentle, non-technical introduction to the topic. Every item in the small and carefully-chosen bibliography is worth following up.
Kominek & Black: CMU ARCTIC databases for speech synthesis
Widely used, copyright-free speech databases for use in speech synthesis
Ling et al: Deep Learning for Acoustic Modeling in Parametric Speech Generation
A key review article.
Łańcucki. FastPitch: Parallel Text-to-speech with Pitch Prediction
Very similar to FastSpeech2, FastPitch has the advantage of an official Open Source implementation by the author (at NVIDIA).
Shen et al: Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions
Tacotron 2 was one of the most successful sequence-to-sequence models for text-to-speech of its time and inspired many subsequent models.
Talkin: A Robust Algorithm for Pitch Tracking (RAPT)
The classic algorithm for estimating F0 from speech signals.
Taylor – Chapter 16 – Unit-selection synthesis
A substantial chapter covering target cost, join cost and search.
Taylor – Chapter 3 – The text-to-speech problem
Discusses the differences between spoken and written forms of language, and describes the structure of a typical TTS system.
Taylor – Section 17.1 – Databases
Including the important issue of labelling the data
Taylor – Section 17.2 – Evaluation
Testing of the system by the developers, as well as via listening tests.