Class

Status: see individual classes below

Slides

Demo pages

  • Example audio codec: SoundStream
  • Example Large Speech Language Models:
  • Shannon Text Generator
    • This is just a character N-gram trained on a small amount of text, not an LLM!
    • Try training it on natural language from different domains, or even with Python code.
  • Example speech editing model: VoiceCraft
  • Example Voice Conversion models:
    • DualVC 3 (ASR+TTS-style architecture, but using an SSL model instead of explicit ASR, and configured to be causal to enable real-time use)
    • StreamVC (audio codec-based)