DNN-based synthesis

In HMM-based speech synthesis, the hard work is done by a regression tree. Trees are rather naive models, so why not use something more powerful? A (Deep) Neural Network is a learnable, general-purpose non-linear transform that can be used for regression.

The basics
A neural network can be thought of as a general, learnable, non-linear mapping. In synthesis, we use it to perform regression.
Preparing the data
Since the inputs and outputs of a neural network must be vectors of numbers, we have to encode our data in an appropriate way.
Training
An informal explanation of how the network is trained using back-propagation.
Synthesising
Putting text through the front end, encoding the resulting linguistic features as vectors, a forward pass through the network, optional trajectory generation, then waveform generation.

DNN-based synthesis

The basics

Preparing the data

Training

Synthesising

Search the forums