A tutorial given at Interspeech 2017
Log inSimon King, Oliver Watts, Srikanth Ronanki, Felipe Espic
Centre for Speech Technology Research, University of Edinburgh, UK
Zhizheng Wu
Apple Inc, USA
We gratefully acknowledge the support from ISCA and from the Interspeech 2017 organisers, in putting on this tutorial in Stockholm.
This tutorial combines the theory and practical application of Deep Neural Networks (DNNs) for Text-to-Speech (TTS). It illustrates how DNNs are rapidly advancing the performance of all areas of TTS, including waveform generation and text processing, using a variety of model architectures. We link the theory to implementation with the Open Source Merlin toolkit.
Slides
You might also be interested in the Speech Synthesis course.
Links
- Merlin
- Ossian – older static version or latest version on GitHub
- Festival
- WORLD
- Felipe Espic’s MagPhase vocoder with code available on GitHub