Tacotron 2 was one of the most successful sequence-to-sequence models for text-to-speech of its time and inspired many subsequent models.
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis and Yonghui Wu. “Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions” in Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) DOI:10.1109/ICASSP.2018.8461368
Publisher’s version (preferred)
Publisher’s version (preferred for Edinburgh University students)