Shen et al. Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions

Tacotron 2 was one of the most successful sequence-to-sequence models for text-to-speech of its time and inspired many subsequent models.

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis and Yonghui Wu. “Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions” in Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) DOI:10.1109/ICASSP.2018.8461368

Publisher’s version

Only logged-in users can provide ratings for readings

This reading is
Very useful		0
Somewhat useful		1
Confusing		0

Shen et al. Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions

Search the forums…

In the forums…