- This topic has 1 reply, 2 voices, and was last updated 8 years, 6 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Unit selection › Advantages of Spontaneous Speech Database
What are the pros of building a database from spontaneous speech? I can think of two, and yet neither seems advantageous enough to make spontaneous speech preferable to recorded speech:
– Data are easier to collect, so the database can be quite large. (but it also contains a lot of disfluencies,co-articulations, noise etc. which decreases the overall quality of the database)
– “Interesting” variations in prosody – But without a solid way to model (and label) this prosodic variation, will it not just be a lot of extra, and even confusing, information?
Are there any other significative advantages to the recording of spontaneous speech over studio-recorded one?
Building synthetic voices from spontaneous speech is an area of active research.
Although we might be able to gather a lot of spontaneous speech, one barrier is that we then have to manually transcribe it. The second barrier is that it is hard to align the phonetic sequence with the speech; this is for many of the same reasons that Automatic Speech Recognition of such speech is hard (you list some of them: disfluencies, co-articulations, deletions,…).
The hypothesised advantage of using spontaneous speech, over read text, is that the voice would sound more natural.
You put your finger on the core theoretical problem though: without a good model of the variation in spontaneous speech (including a predictive model of that variation given only text input), it is indeed just unwanted noise in the database.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in