Forum Replies Created

Viewing 14 posts - 1 through 14 (of 14 total)

Author

Posts
April 20, 2020 at 05:15 in reply to: How to get auto-labeled files in Chinese? #11193
YA
Student
Emailed the Blizzard challenge support and found the synthetic speech of previous participants here: http://www.cstr.ed.ac.uk/projects/blizzard/data.html
April 19, 2020 at 11:21 in reply to: How to get auto-labeled files in Chinese? #11184
YA
Student
Hi Simon,

Hope you are well!
I am reading The Blizzard Challenge 2019 papers (http://www.festvox.org/blizzard/blizzard2019.html) and would like to know if there’s a way that we could listen to the synthetic voices submitted by different teams?

Also, please kindly advice if you prefer alumni to post questions in other venues (e.g. Linkedin or email).

Thank you!
July 9, 2019 at 00:08 in reply to: Tools for running a web-based listening test #9788
YA
Student
Thank Simon!

Have a follow up question on speech intelligibility measures.

I was reviewing the Evaluation videos and noticed the objective measures (e.g. MCD and RMSE) seem to be related to naturalness rather intelligibility.

Is there a way to measure intelligibility “objectively”? Thank!
June 28, 2019 at 21:25 in reply to: Tools for running a web-based listening test #9786
YA
Student
Did Blizzard listening test today and have a couple of questions on WER.

How to calculate WER for words that are originally from other languages for mandarin? In Mandarin, many different characters sound exactly the same.
For example, characters for Victoria Falls could be one of the following sentences (and many other more combinations, at least for the translation for “Victoria”):
1. 维多利亚瀑布,
2. 维多莉亚瀑布,
3. 维多莉雅瀑布 ,
4. 維多利亞瀑布.

Specifically, can we use WER to measure intelligibility of all (or most) languages?

thanks!
June 8, 2019 at 15:31 in reply to: How to get auto-labeled files in Chinese? #9783
YA
Student
In the “Front end” video in the Deep Learning for Text-to-Speech Synthesis, using the Merlin toolkit, Mr. Watts mentioned we need to search proper TTS front end for languages that are not supported by Festival. For example, Thai, Korean, Japanese, Cantonese and Mandarin.

Is there a recommended/efficient way/place/platform to conduct the search other than googling it?

Thanks!
April 8, 2019 at 12:18 in reply to: Tools for running a web-based listening test #9777
YA
Student
Thanks Simon!

A couple of follow-up questions on the overall WER for a system:
1. When testing intelligibility of a system, surly sentences with various length might be used. Should we use a weighted average (take sentence length into consideration) to calculate the overall WER for a system?

2. When reporting WER for different sentences within a system, is it a good idea to include the reference sentence in the result?

3. I am calculating WER manually and wonder if this is why many listening tests recruit less than 20 listeners?

Thanks!
April 7, 2019 at 23:13 in reply to: Tools for running a web-based listening test #9775
YA
Student
Hi,

I am curious about how does WER calculated in the Blizzard challenge? Is it done by human marker? Or is it done by sclite alone with human marker? Thanks!
March 28, 2019 at 15:27 in reply to: Tools for running a web-based listening test #9753
YA
Student
Thanks Simon!

The synthesised utterances are rather short, ranging from 3-7 seconds.
I want to add short silence at the beginning of sentences for the intelligibility tests, so that there are perhaps 1 second (or less) of “buffering” time for participants to get ready.
I am not sure if this is overthinking, but that’s why I want to test out.
March 28, 2019 at 14:30 in reply to: Tools for running a web-based listening test #9751
YA
Student
Is there a way to add short silence at the beginning and and the end of a synthesised speech? I tried to add a colon or a full stop at the beginning of the sentence, but it doesn’t work.

((R:Token.parent.punc in (“?” “.” “:”))
((BB))
((R:Token.parent.punc in (“‘” “\”” “,” “;”))
((B))

Thanks!
March 24, 2019 at 23:51 in reply to: Tools for running a web-based listening test #9735
YA
Student
Hi,

I know we can save a synthesized waveform with below commands, but is there a way to save multiple waveforms in a batch (i.e. 20- 30 sentences for one system) ?

festival> (set! myutt (SayText “Hello world.”))
festival> (utt.save.wave myutt “myutt.wav” ‘riff)

Thanks!
November 11, 2018 at 01:13 in reply to: Jurafsky & Martin – Chapter 9 #9582
YA
Student
J&M 2Edition, 9.3.4 (9.11)
Could you please elaborate a bit more on the Hamming window? How did we derive 0.54 and 0.46 from? Are they fixed numbers? or could vary depending on design/preference? Thanks!
November 11, 2018 at 01:05 in reply to: Jurafsky & Martin – Chapter 9 #9581
YA
Student
J&M 2Edition, 9.3.4 (9.14) mentions the mel frequency m can be computed from the raw acoustic frequency as follows:
mel(f) = 11271n(1+ f/700)

Could you please explain what does the f and n stand for respectively?
Also, how did we derive 11271 and 700 from? Thanks!
November 11, 2018 at 00:58 in reply to: Jurafsky & Martin – Chapter 9 #9580
YA
Student
J&M 2 Edition, in 9.3.3, page 299 mentions the fast Fourier transform or FFT is very efficient but only works for values of N that are powers of 2.

Could you please explain why it only works for values of N that are powers of 2?
October 28, 2018 at 00:21 in reply to: Holmes & Holmes – Chapter 8 #9498
YA
Student
In Speech Synthesis and Recognition, page 124 on continuous speech recognition, figure 8.9, the figure title explains three template sequences are been consider- T1-T3-T1- T3, T1-T3-T3-T1 and T1-T1-T1-T1. However, judging from the illustration, I read three sequences as T1-T3-T1- T2, T1-T3-T3-T2 and T1-T3-T3-T3. Could you please explain how to read this trace-back chart correctly? thanks!

Attachments:
You must be logged in to view attached files.
Author

Posts

Viewing 14 posts - 1 through 14 (of 14 total)

YA

Forum Replies Created

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis