› Forums › Speech Synthesis › Evaluation › Example of Evaluation That Was Surprising
- This topic has 2 replies, 2 voices, and was last updated 8 years, 6 months ago by Joseph M.
-
AuthorPosts
-
-
February 4, 2016 at 22:23 #2425
After digesting your lectures on eval, plus Taylor’s thoughts, and the various papers, and then sitting in on the discussion regarding the CSTR eval of the Voice Conversion Challenge, I remain deeply skeptical of the majority of the ‘standard’ listening tests. Its not that I don’t think you could get lay-listeners, or so-called experts to give reasonable responses on MOS style surveys, or that that its impossible to control for listener fatigue or stimulus ordering or other psychoacoustic phenomena (though there are many concerns along these lines, they can be at least somewhat controlled for with good experiment design). My bigger concern is that it seems like a waste of time. Necessary perhaps for publishing an academic paper with ‘results’, or for judging some kind of ‘challenge’. But I find it difficult to believe that the results would ever be genuinely surprising – that is, they would be substantially different from what the designer(s) of the system already knew. Can you offer some examples of evaluations that generated truly surprising results, where the system designers were simply astounded to find that listeners were hearing something, or having a reaction (cognitive, emotional, preferential, etc) that they themselves never would have predicted?
-
February 7, 2016 at 19:48 #2560
The usual form of surprising results is that listeners didn’t hear an improvement that the designers thought they had made, or that some other aspect of the synthetic speech masked the possible improvement (e.g., the speech did sound more prosodically natural, but the waveform quality was lower, and so listeners preferred the baseline).
I’m struggling to think of any genuine positive surprises, but will keep thinking…
-
February 7, 2016 at 22:32 #2561
Can you post a link to papers that showed the kind of surprising results you describe here? Audio examples would be even better. I would like to understand the experiment design that was used. For instance, in the second example you site, at least based on the information you stated, I would argue that they should not have been surprised: asking anyone, even a phonetician or musician, to ‘only focus on the prosody’, and ignore the low quality waveform (especially if the baseline actually had a higher quality waveform!!), is, in my opinion, a flawed experimental approach. In any experiment where the listeners DIDN’T hear an expected improvement, my first question is, why not? Was it the experiment design (as mentioned previously), or was it that the designers were so caught up with their technical achievement that they didn’t notice that in reality it was actually quite subtle from an audio perspective, or even un-noticeable to the ‘average’ listener? Has this exact situation ever happened to you, where you were working on improving a voice, and you thought you had made some improvement but your listeners couldn’t hear it? And if so, what is your explanation for why that happened?
-
-
AuthorPosts
- You must be logged in to reply to this topic.