reading

Bennett: Large Scale Evaluation of Corpus-based Synthesisers

An analysis of the first Blizzard Challenge, which is an evaluation of speech synthesisers using a common database.

The classic description of unit selection, described as a search through a network.

A deliberately gentle, non-technical introduction to the topic. Every item in the small and carefully-chosen bibliography is worth following up.

Widely used, copyright-free speech databases for use in speech synthesis

A key review article.

Very similar to FastSpeech2, FastPitch has the advantage of an official Open Source implementation by the author (at NVIDIA).

Tacotron 2 was one of the most successful sequence-to-sequence models for text-to-speech of its time and inspired many subsequent models.

The classic algorithm for estimating F0 from speech signals.

A substantial chapter covering target cost, join cost and search.

Discusses the differences between spoken and written forms of language, and describes the structure of a typical TTS system.

Including the important issue of labelling the data

Testing of the system by the developers, as well as via listening tests.