Page 63

Forum Replies Created

Viewing 15 posts - 931 through 945 (of 1,087 total)

← 1 2 3 … 62 63 64 … 71 72 73 →

Author

Posts
February 10, 2016 at 14:03 in reply to: SABLE Markup tags #2571
Simon
Professor
This might be because you did a cut-and-paste from this webpage, which picked up HTML versions of some characters?
February 10, 2016 at 12:02 in reply to: SABLE Markup tags #2566
Simon
Professor
Yes, this will change between voices. The format of the name of a voice is the same that you would use within Festival, minus the “voice_” prefix. Try creating a file called test.sable (make sure the suffix is .sable and that your editor doesn’t add another suffix) with these contents:
```
Changes of speaker may appear in the text.
Using one speaker
Eventually returning to the original default speaker.
```
and run it through Festival like this
```
bash$ festival --tts test.sable
```
Note that SABLE was a putative standard developed a long time ago by us in Edinburgh with a few companies. It has been superseded. See also the earlier standard SSML and the related standard for interactive systems, VoiceXML.
February 8, 2016 at 11:04 in reply to: text2wave #2563
Simon
Professor
There was an error a path inside the text2wave script. Try again and report back. Remember to source the setup.sh first – this sets your PATH.

(Also, post the complete error message including the full command line you are running so I can replicate the error.)
February 7, 2016 at 19:48 in reply to: Example of Evaluation That Was Surprising #2560
Simon
Professor
The usual form of surprising results is that listeners didn’t hear an improvement that the designers thought they had made, or that some other aspect of the synthetic speech masked the possible improvement (e.g., the speech did sound more prosodically natural, but the waveform quality was lower, and so listeners preferred the baseline).

I’m struggling to think of any genuine positive surprises, but will keep thinking…
February 7, 2016 at 15:56 in reply to: MDS for richer interpretation of MOS results? #2559
Simon
Professor
Yes, that would seem a reasonable conclusion. Your hypothetical MDS test has found that listeners only use prosodic naturalness to distinguish between stimuli. Either they do not hear segmental problems, or there are none (it doesn’t matter which).
February 7, 2016 at 14:45 in reply to: MDS for richer interpretation of MOS results? #2557
Simon
Professor
Could you clarify the question a little bit – I’m not sure about “a dimension corresponding to naturalness, and a second principal dimension strongly corresponding to prosodic naturalness”
February 7, 2016 at 14:36 in reply to: Tools for running a web-based listening test #2556
Simon
Professor
If you want to try an objective measure (perhaps to see if it correlates with your listeners’ judgements), here’s a Python implementation of Mel Cepstral Distortion (MCD) by Matt Shannon.

This requires skills in compiling code (if you do this, please post information here), is entirely optional and certainly not required for this assignment.
February 7, 2016 at 13:17 in reply to: Harder Intelligibility Test Design #2555
Simon
Professor
We will look at how to calibrate the materials, in the lecture.
February 7, 2016 at 13:10 in reply to: What to do after evaluation #2554
Simon
Professor
We’ll spell this out in the lecture.
February 7, 2016 at 13:09 in reply to: MOS calibration #2553
Simon
Professor
We’ll look at various ways of calibrating listeners’ responses, in the lecture.
February 7, 2016 at 13:07 in reply to: Evaluation of comprehension and working memory #2552
Simon
Professor
Including a control trial sounds like the right thing to do. In almost all types of evaluation, we need something to compare to:
1. We explicitly ask listeners to make a comparison and tell us their judgement
2. We compare the responses of listeners across several conditions
The second case applies here. If we cannot control for unknown factors (e.g., working memory) then we have to make sure those factors have the same value across all conditions that we wish to compare. In your example, we would have the same subjects perform the same task, once on the synthetic speech, and once on natural speech, then we would quantify the difference in their responses (e.g., accuracy in a comprehension test).
February 7, 2016 at 13:02 in reply to: Acting on the evaluation result #2551
Simon
Professor
You are correct in saying that it’s very hard to trace back a problem in the output speech to a specific component in the synthesiser, especially so when the text-processing has the typical pipeline architecture.

Where possible, we will perform component-level testing and, if we are lucky, that can be done objectively (without humans) by comparing output to a gold-standard reference.

Otherwise, what we have to do is make a hypothesis, then create an experiment to test (or refute) that hypothesis. In general, this is going to require us to create two or more systems that differ in a specific way (e.g., the pronunciation dictionary is or is not carefully tuned to the speaker), then compare them in a listening test.
February 7, 2016 at 13:01 in reply to: Unit or component testing #2550
Simon
Professor
Unit tests (actually, component tests – as discussed in this thread) are used during system development. For example, we might iteratively test and improve the letter-to-sound module in the front end until we can no longer improve its accuracy on a test set of out-of-vocabulary words. This component testing provides more useful information than system testing, because we know precisely which component is causing the errors we observe, so we know where improvements are needed.

Once our system is complete, and we think we’ve got decent performance in each component, we then perform end-to-end system testing.

Your insight that testing and improving individual components might not lead to best end-to-end performance certainly has some truth in it. Some components contribute far more than others to overall system performance. Putting that another way, end-to-end testing might sometimes help us identify (using our engineering intuition) which component needs most improvement: where a given amount of work would have the largest effect.

I think you are also suggesting that components should be optimised jointly. This is another good insight. For example, if errors in one component can be effectively corrected later in the pipeline, then there is no need to improve that earlier component. Unfortunately, the machine-learning techniques used in modules in a typical system (e.g., Festival’s text processing front end) do not lend themselves to this kind of joint optimisation.
February 7, 2016 at 12:49 in reply to: Testing during system development #2549
Simon
Professor
If we don’t want to perform a listening test after every minor change to the system, then we need to rely on either
1. Our own listening judgement
2. Objective measures
We’ll cover objective measures in the lecture.

Objective measures are widely used in statistical parametric synthesis. In fact, the statistical model is essentially trained to minimise a kind of objective error with the training data. We can then measure the error with respect to some held-out data.
February 7, 2016 at 12:38 in reply to: Evaluation phases and comparing systems #2548
Simon
Professor
Multiple rounds of evaluation would be normal when developing a large system over a period of time. For the “Build your own unit selection voice” exercise, that would probably take too much time though.

In general, it’s difficult to evaluate a single system in isolation: this is because most types of evaluation provide a relative judgement compared to one or more other systems or references. Even in the case of intelligibility testing, where evaluating a single system sounds reasonable, we still need to interpret the result: for example, is a Word Error Rate of 15% good or bad? One way to know would be to measure the intelligibility of natural speech under the same conditions.

When comparing multiple systems, we would normally use the same speaker and in fact the exact same database (unless we were investigating the effect of database size or content). Trying to compare two systems built from different speakers’ data would not enable us to separate the effects of speaker from those of the system.
Author

Posts

Viewing 15 posts - 931 through 945 (of 1,087 total)

← 1 2 3 … 62 63 64 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis