Tachibana et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

DCTTS is comparable to Tacotron, but is faster because it uses non-recurrent architectures for the encoder and decoder.

Hideyuki Tachibana, Katsuya Uenoyama, Shunsuke Aihara. “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention” in Proc. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4784-4788.

in Proc. 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), 202–207 DOI: 10.1109/ICASSP.2018.8461829

Publisher’s version

Publisher’s version (preferred for Edinburgh University students)

CSTR’s Ophelia is one of several Open Source implementations of DCTTS

Tachibana et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention

Search the forums…

In the forums…