Class

2025-03-18 class format:

FastPitch – case study: model training
SoundStream – learning to encode speech
VALL-E – a Large Speech Language Model

2025-03-25 class format:

VALL-E – a Large Speech Language Model (continued)
Tasks beyond TTS, including Voice Conversion

Demo pages:

Example audio codec: SoundStream
Example Large Speech Language Models:
- VALL-E
- Parler
Example speech editing model: VoiceCraft
Example Voice Conversion models:
- DualVC 3 (ASR+TTS-style architecture, but using an SSL model instead of explicit ASR, and configured to be causal to enable real-time use)
- StreamVC (audio codec-based)

Download the slides for the class on 2025-03-18

Download the slides for the class on 2025-03-25