Page 31

Forum Replies Created

Viewing 15 posts - 451 through 465 (of 1,087 total)

← 1 2 3 … 30 31 32 … 71 72 73 →

Author

Posts
July 9, 2019 at 09:01 in reply to: Tools for running a web-based listening test #9789
Simon
Professor
People have tried using automatic speech recognition to evaluate the intelligibility of synthetic speech, but with only limited success. So the simple answer is that there is no objective measure of intelligibility.
June 30, 2019 at 11:32 in reply to: Tools for running a web-based listening test #9787
Simon
Professor
The Blizzard Challenges in 2008, 2009, and 2010 included tasks on Mandarin Chinese and the summary papers for these years (available from http://festvox.org/blizzard/index.html) tell you about the two measures used: pinyin error rate with or without tone.

The general answer is: no, Word Error Rate is not the most useful measure of intelligibility for all languages.
June 10, 2019 at 08:39 in reply to: How to get auto-labeled files in Chinese? #9784
Simon
Professor
There is no single index of all available TTS front-ends – the closest thing would be on the SynSIG website’s software list.

Availability varies widely with language, and for some there is no free software available.

So the short answer is “No – you’ll need to talk to your supervisor”.
April 24, 2019 at 09:26 in reply to: How do I make good quality figures? #9779
Simon
Professor
Here are some useful guidelines on making good figures.
April 8, 2019 at 17:02 in reply to: Tools for running a web-based listening test #9778
Simon
Professor
1. A weighted average WER is correct, but the simplest way to calculate this is just to sum up insertions, deletions and substitutions across the entire test set, then divide by the total number of words in the reference.

2. I’m not sure you would often want to report WER for individual sentences – this will be highly variable and likely to be based on very few samples. You would need to have a specific reason to report (and analyse) per-sentence WER.

3. No, that’s not the reason! It’s easy to automate WER calculation. Published work using too few listeners just indicates lazy experimenters!
April 8, 2019 at 09:45 in reply to: Tools for running a web-based listening test #9776
Simon
Professor
The Blizzard Challenge uses a standard dynamic programming approach to align the reference with what the listener transcribed – very much like HResults from HTK or sclite. WER is then calculated in the usual way, summing up insertions, deletions and substitutions and dividing by the total number of words in the reference.

The procedure is slightly enhanced for Blizzard to allow for listeners’ typos (which are defined in a manually-created lookup table, updated for each new test set used once we see the typical mistakes they make for those particular sentences).

For your listening tests, I recommend manually correcting any typos, then either computing WER manually, or using HResults – that’s just a matter of getting things in the right file format. Your reference would be in an MLF and your listener transcriptions would be in .rec files.

Whilst we are on this topic, this is a good time to remember that in general you cannot compute WER per sentence, then average over all sentences. This is only valid if all sentences have the same number of words (in the reference).
April 1, 2019 at 17:04 in reply to: Local tools for running evaluations #9765
Simon
Professor
I’ve not heard of this error before, and do know that many people have used Qualtrics successfully.

One thing to try would be converting the wav file to high bitrate (at least 128 kbps and ideally 320 kbps) mp3 and see if Qualtrics prefers that. Not ideal, but an acceptable workaround.

If the error persists, contact the IS helpline by email for support.
March 28, 2019 at 18:08 in reply to: Tools for running a web-based listening test #9756
Simon
Professor
You are probably overthinking. If the audio plays correctly in whatever tool you use to implement the listening test, then there is no reason to pad with extra silence.
March 28, 2019 at 14:46 in reply to: Tools for running a web-based listening test #9752
Simon
Professor
Why do you need to do this?

One option would be just to add some silence using sox.
March 25, 2019 at 16:56 in reply to: do_alignment: mixtures to 10 #9738
Simon
Professor
The error message tells you the problem: the file resources/mixup10.hed does not exist. You will need to create it. Look at the ones that do exist to figure out what needs to go in it – just some simple commands to HHEd.
March 25, 2019 at 10:37 in reply to: Tools for running a web-based listening test #9736
Simon
Professor
One easy way is to write a very simple program (Python or shell script) to generate a file that contains something like
```
(voice_localdir_multisyn-rpx)
(set! myutt (SayText "Hello world."))
(utt.save.wave myutt "sentence001.wav" 'riff)
(set! myutt (SayText "Here is the next sentence."))
(utt.save.wave myutt "sentence002.wav" 'riff)
```
Your program should save this into a file, perhaps called generate_test_sentences.scm and then you can execute that in Festival simply by passing it on the command line like this
```
$ festival generate_test_sentences.scm
```
February 14, 2019 at 15:05 in reply to: Response to Speech Synthesis feedback of 2019-02-12 #9693
Simon
Professor
The course should have more material on neural networks and state-of-the-art methods such as Wavenet, right from the start

The course will cover all of these things, don’t worry. But it makes no sense to jump into state-of-the-art models until the foundations are solid, so that we can motivate why particular approaches are the state-of-the-art.

In other courses that I am familiar with, the state-of-the-art is covered at the start of the course, but then I find students lack the foundations and simply don’t really understand the material. They then have to backtrack and learn the foundations that are needed, sometimes on their own.

As it happens, Wavenet, in its original text-to-waveform configuration, was only very briefly the state-of-the-art and has already been superseded That’s another reason to do the state-of-the-art at the end of the course, so we can include the very latest approaches from this fast-moving field.
February 14, 2019 at 15:02 in reply to: Response to Speech Synthesis feedback of 2019-02-12 #9692
Simon
Professor
Could the assignment be done in pairs or groups?

I’m not a fan of pair or group work, where a single report is submitted and therefore all members of the group receive the same mark. My experience with observing this in other courses is that some students do a lot more work than others.

You can of course work together in the lab, discussing the theory and practice of speech synthesis, talking about what limited domain you might use, what kinds of hypotheses make sense, or which tool to use to implement a listening test, etc. But you must then execute the work yourself.
February 14, 2019 at 15:00 in reply to: Response to Speech Synthesis feedback of 2019-02-12 #9691
Simon
Professor
More instructions on shell scripting / why can’t we use Python

The voice building ‘recipe’ we are using is written as shell or Scheme scripts, and it’s not easy to change that. Learning more shell scripting (in addition to what was covered in Speech Processing) is an important aspect of the course.

There some shell scripting help on the forums and I am always happy to add to that, in response to specific questions and requests.

You can use Python for everything you implement yourself, such as a text selection algorithm.
February 14, 2019 at 14:57 in reply to: Response to Speech Synthesis feedback of 2019-02-12 #9690
Simon
Professor
Lab tasks for each week could be clearer / labs could be more structured

We will provide more class-wide instructions during the remaining lab sessions, whilst still leaving plenty of time for individual help.
Author

Posts

Viewing 15 posts - 451 through 465 (of 1,087 total)

← 1 2 3 … 30 31 32 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis