Deep Learning for Text-to-Speech Synthesis, using the Merlin toolkit

Start Videos Finish

Simon King, Oliver Watts, Srikanth Ronanki, Felipe Espic
Centre for Speech Technology Research, University of Edinburgh, UK

Zhizheng Wu
Apple Inc, USA

We gratefully acknowledge the support from ISCA and from the Interspeech 2017 organisers, in putting on this tutorial in Stockholm.

This tutorial combines the theory and practical application of Deep Neural Networks (DNNs) for Text-to-Speech (TTS). It illustrates how DNNs are rapidly advancing the performance of all areas of TTS, including waveform generation and text processing, using a variety of model architectures. We link the theory to implementation with the Open Source Merlin toolkit.

Deep Learning for Text-to-Speech Synthesis, using the Merlin toolkit

Slides

Links

Video: a walk through the demo