The remainder of the course will cover the State-of-the-Art. We will be reading recent research papers, so make sure to allow plenty of time to do that before the class: these are challenging papers and will require multiple read-throughs.
Roadmap:
- Neural speech processing (vocoders; audio codecs; representation learning)
- we need to revisit representations of both text and speech; the key advance will be to find a discrete representation of speech
- Large Speech Language Models
- a discrete representation of speech will enable us to use models that can only generate discrete representations: language models
- Beyond Text-to-Speech (cloning, conversion, anonymisation,…)
- yes, there is more to life than TTS! We don’t have to limit ourselves to textual input!