Shen et al: Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions

Tacotron 2 was one of the most successful sequence-to-sequence models for text-to-speech of its time and inspired many subsequent models.

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis and Yonghui Wu. “Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions” in Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) DOI:10.1109/ICASSP.2018.8461368

Publisher’s version (preferred)

Publisher’s version (preferred for Edinburgh University students)

Only logged-in users can provide ratings for readings

No ratings yet.

Tags: , , ,