reading

King: A beginners’ guide to statistical parametric speech synthesis

A deliberately gentle, non-technical introduction to the topic. Every item in the small and carefully-chosen bibliography is worth following up.

Pollet & Breen: Synthesis by Generation and Concatenation of Multiform Segments

Another way to combine waveform concatenation and SPSS is to alternate between waveform fragments and vocoder-generated waveforms.

Qian et al: A Unified Trajectory Tiling Approach to High Quality Speech Rendering

The term “trajectory tiling” means that trajectories from a statistical model (HMMs in this case) are not input to a vocoder, but are “covered over” or “tiled” with waveform fragments.

Taylor – Chapter 15 – Hidden-Markov-model synthesis

Written with a traditional “starting from automatic speech recognition” viewpoint, you will need to make the connections for yourself to the more general concept of text-to-speech as a regression problem.

Zen, Black & Tokuda: Statistical parametric speech synthesis

A review article that makes some useful connections between HMM-based speech synthesis and unit selection.

emulabel
reply by Simon

3 weeks ago

Upload Audio Files to Qualtrics
3 weeks ago

About abstract and introduction
reply by Simon

3 weeks ago

Autocorrelation and Pitch Prediction in FastPitch Vs. UnitSelec
reply by Simon

3 weeks ago

SIOD ERROR: not a number
reply by Iakovi A

3 weeks ago

Synthesis with SoundStream
reply by Simon

3 weeks ago

save output of festival command
reply by Simon

3 weeks ago

About target cost
4 weeks ago

Voice with new dictionary and phone set
reply by Korin Richmond

1 month ago

Gibberish: Bad pitch marking or do_alignment?
reply by Simon

1 month ago

Response to Speech Synthesis feedback of 2024-02
reply by Simon

1 month ago

do_alignment script
1 month ago

Can't make mfcc list
reply by Simon

2 months ago

Phone (‘oir’) missing from unilex-gam?
reply by Zoë B

2 months ago

Out-of-dictionary words
reply by Simon

2 months ago