Class

2025-03-18 class format:

  • FastPitch – case study: model training
  • SoundStream – learning to encode speech
  • VALL-E – a Large Speech Language Model

2025-03-25 class format:

  • VALL-E – a Large Speech Language Model (continued)
  • Tasks beyond TTS, including Voice Conversion

Demo pages:

  • Example audio codec: SoundStream
  • Example Large Speech Language Models:
  • Example speech editing model: VoiceCraft
  • Example Voice Conversion models:
    • DualVC 3 (ASR+TTS-style architecture, but using an SSL model instead of explicit ASR, and configured to be causal to enable real-time use)
    • StreamVC (audio codec-based)

Download the slides for the class on 2025-03-18

Download the slides for the class on 2025-03-25