Build your own neural speech synthesiser

This exercise is the replacement for building your own unit selection voice. You will use your data to train a neural sequence-to-sequence model, similar to FastSpeech 2.
  • Introduction

    An overview of the complete process and some tips for success.

  • Milestones

    To keep on track, check your progress against these milestones. Try to stay ahead of them if you can.

  • Access the compute facility

    First we need to check we can log in to the compute facility that we will be using: the Eddie computer at the Edinburgh Compute and Data Facility (ECDF).

  • Train the model on existing speech data

    Before recording your own speech, you will train the model on some existing data.

  • Synthesise!

    It's time to generate synthetic speech from our trained model.

  • Record your own speech data

    The recorded speech data comprises text-speech pairs from which we will train a model. The model will therefore be influenced by both the content (e.g., words, phonetic coverage) and speaking style.

  • Evaluation

    The main form of evaluation should be a listening test with multiple naive listeners. But there are other ways to evaluate, and potentially to improve, your voice.

  • Writing up

    Because you kept such great notes in your logbook (didn't you?), writing up will be easy and painless.