Build your own neural speech synthesiser

This exercise is the replacement for building your own unit selection voice. You will use your data to train a neural sequence-to-sequence model, similar to FastSpeech 2.

Click here to view as a single page

Introduction
An overview of the complete process and some tips for success.
Milestones
To keep on track, check your progress against these milestones. Try to stay ahead of them if you can.
Access the compute facility
First we need to check we can log in to the compute facility that we will be using: the Eddie computer at the Edinburgh Compute and Data Facility (ECDF).
Train the model on existing speech data
Before recording your own speech, you will train the model on some existing data.
Synthesise!
It's time to generate synthetic speech from our trained model.
Record your own speech data
The recorded speech data comprises text-speech pairs from which we will train a model. The model will therefore be influenced by both the content (e.g., words, phonetic coverage) and speaking style.
Evaluation
The main form of evaluation should be a listening test with multiple naive listeners. But there are other ways to evaluate, and potentially to improve, your voice.
Writing up
Because you kept such great notes in your logbook (didn't you?), writing up will be easy and painless.

Build your own neural speech synthesiser

Introduction

Milestones

Access the compute facility

Train the model on existing speech data

Synthesise!

Record your own speech data

Evaluation

Writing up

Search this site

Posts

Latest Activity

Search the forums