Very similar to FastSpeech2, FastPitch has the advantage of an official Open Source implementation by the author (at NVIDIA).
Ling et al: Deep Learning for Acoustic Modeling in Parametric Speech Generation
A key review article.
Kominek & Black: CMU ARCTIC databases for speech synthesis
Widely used, copyright-free speech databases for use in speech synthesis
King: Measuring a decade of progress in Text-to-Speech
A distillation of the key findings of the first 10 years of the Blizzard Challenge.
King: A beginners’ guide to statistical parametric speech synthesis
A deliberately gentle, non-technical introduction to the topic. Every item in the small and carefully-chosen bibliography is worth following up.
King et al: Speech synthesis using non-uniform units in the Verbmobil project
Of purely historical interest, this is an example of a system using a heterogeneous unit type inventory, developed shortly before Hunt & Black published their influential paper.
Kawahara et al: Restructuring speech representations…
The key paper about the STRAIGHT vocoder, which was originally intended for manipulating recorded natural speech.
Jurafsky & Martin – Section 8.5 – Unit Selection (Waveform) Synthesis
A brief explanation. Worth reading before tackling the more substantial chapter in Taylor (Speech Synthesis course only).
Hunt & Black: Unit selection in a concatenative speech synthesis system using a large speech database
The classic description of unit selection, described as a search through a network.
Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech (Sections 6-7)
Written for a non-technical audience, this gently introduces some key concepts in speech signal processing. Read sections 6-7.
Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech (Sections 1-5)
Written for a non-technical audience, this gently introduces some key concepts in speech signal processing. Read sections 1-5 (up to and including ‘Fourier Analysis’).
Handbook of phonetic sciences – Ch 20 – Intro to Signal Processing for Speech
Written for a non-technical audience, this gently introduces some key concepts in speech signal processing.