Finish

The remainder of the course will cover the State-of-the-Art. We will be reading recent research papers, so make sure to allow plenty of time to do that before the class: these are challenging papers and will require multiple read-throughs.

Roadmap:

  • Neural speech processing (vocoders; audio codecs; representation learning)
    • we need to revisit representations of both text and speech; the key advance will be to find a discrete representation of speech
  • Large Speech Language Models
    • a discrete representation of speech will enable us to use models that can only generate discrete representations: language models
  • Beyond Text-to-Speech (cloning, conversion, anonymisation,…)
    • yes, there is more to life than TTS! We don’t have to limit ourselves to textual input!