Page 64

Forum Replies Created

Viewing 15 posts - 946 through 960 (of 1,087 total)

← 1 2 3 … 63 64 65 … 71 72 73 →

Author

Posts
February 7, 2016 at 12:13 in reply to: Output quality: common evaluation framework & materials (SUS) #2547
Simon
Professor
The reason for using SUS is, of course, to avoid a ceiling effect in intelligibility. But you are not the first person(*) to suggest that SUS are highly unnatural and to wonder how much a SUS test actually tells us about real-world intelligibility.

A slightly more ecologically valid example would be evaluating the intelligibility of synthetic speech in noise, where SUS would be too difficult and ‘more normal’ sentences could be used instead. But such tests are still generally done in the lab, with artificially-added noise. They could hardly be called ecologically valid.

You ask whether “there [are] intelligibility tests using situations that mimic the desired applications?” This would certainly be desirable, and commercial companies might do this as part of usability testing. Unfortunately, mimicking the end application is a lot of work, and so makes the test slow and expensive. Once we start evaluating the synthetic speech as part of a final application, it will get harder to separate out the underlying causes for users’ responses. At this point, we reach the limit of my expertise, and would be better asking an expert, such as Maria Wolters.

* Paul Taylor always told me he was very sceptical of SUS intelligibility testing. He asserted that all commercial systems were already at ceiling intelligibility in real-world conditions, so there was no point measuring it; researchers should focus on naturalness instead. I agree with him as far as listening in quiet conditions is concerned, but synthetic speech is certainly not at ceiling intelligibility when heard in noise.
February 7, 2016 at 12:00 in reply to: Output quality: common evaluation framework & materials (SUS) #2546
Simon
Professor
In the field of speech coding there are standardised tests and methodologies for evaluating codecs. This standardisation is driven by commercial concerns, both those who invent new codecs, and those who use them (e.g., telecoms or broadcasting).

But in speech synthesis there appears to be no commercial demand for equivalent standardised tests. Commercial producers of speech synthesisers never reveal the evaluation results for their products (the same is true of automatic speech recognition).

There are, however, conventions and accepted methods for evaluation that are widely used in research and development. The SUS method is one such method and is fairly widely used (although Word Error Rate is usually reported and not Sentence Error Rate).

The Blizzard Challenge is the only substantial effort to make fair comparisons across multiple systems. The listening test design in the Blizzard Challenge is straightforward (it includes a section of SUS) and is widely used by others. The materials (speech databases + text of the test sentences) are publicly available and are also quite widely used. This is a kind of de facto standardisation.
February 7, 2016 at 10:37 in reply to: Spontaneous Speech Transcription Strategy #2497
Simon
Professor
There are some examples for the Speech Communication paper.
February 6, 2016 at 14:31 in reply to: System testing vs. Unit testing #2453
Simon
Professor
This is potentially confusing – and we don’t want to get hung up on terminology. I’ve added some clarification.
February 6, 2016 at 14:09 in reply to: Sentences Choosing #2450
Simon
Professor
The reasons for avoiding very long sentences in the prompts for recording a unit selection database are
Short sentences might be avoided because they have unusual prosody, and so units from short phrases (e.g., “Hi!”) may not be very suitable for synthesising ‘ordinary’ sentences.
February 6, 2016 at 14:05 in reply to: Word transcription: homophones and typos #2449
Simon
Professor
This is a good point, and one we do indeed have to confront in a practical test. In a SUS test, we compare words (not their pronunciations). It is therefore appropriate to allow listeners to type in homophones, or indeed to mis-spell words.

There is usually either some pre-processing of the typed-in responses, before we compute the Word Error Rate (WER), or we allow for these mismatches when performing the dynamic programming alignment as part of the WER computation. This might be achieved by creating lists of acceptable matches for each word in the correct transcription, such as

correct word: your
allowable responses: your, you’re, yore youre

Such lists need updating for each listening test (after gathering the listeners’ responses) because listeners seem to be very good at finding new ways to mis-spell or mis-type words!

I’ve attached an example list of acceptable variants for a set of Semantically Unpredictable Sentences, taken from the tools used to run the Blizzard Challenge.

Attachments:
You must be logged in to view attached files.
February 5, 2016 at 12:17 in reply to: Build independent applications using HTK or Festival #2440
Simon
Professor
Packaging a speech synthesis voice

Some operating systems provide a way to plug in your own voice, and make it available to all applications on that computer. In Windows, this is called SAPI.

There is no freely-available SAPI5 wrapper for Festival at the current time.
February 5, 2016 at 12:12 in reply to: Build independent applications using HTK or Festival #2439
Simon
Professor
Making applications based on Festival

Festival has a very liberal license that allows you to do almost anything you like (except remove the headers in the source code that say who wrote it). The only practical problem would be speed and memory usage.

There is a faster and simpler system related to Festival, called flite. To make a voice for flite, you need to use the festvox voice building process, but you can start from the same data that you might have collected when building a Festival voice using the multisyn unit selection method.
February 5, 2016 at 12:08 in reply to: Build independent applications using HTK or Festival #2438
Simon
Professor
Making applications based on HTK

You need to be careful about the license conditions, which forbid you from redistributing HTK. I think it is fine to distribute models trained with HTK though. There is an API for building real-time applications around HTK, called ATK (and aimed at spoken dialogue systems).
February 5, 2016 at 12:03 in reply to: Build independent applications using HTK or Festival #2437
Simon
Professor
Several questions there, so let’s deal with them one-by-one.

Running HTK and Festival on Windows

HTK is straightforward to compile on various operating systems (it’s written in plain C), so should be usable on Windows. You might want to install Cygwin, to get a ‘unix-like’ environment.

Festival is trickier – not impossible, but painful and I do not recommend wasting time on this because you can simply run a virtual Linux machine on your Windows computer.

These seem to be a pretty clear set of instructions. After installing Virtual Box (this is the ‘host’ software), you can download an image (basically a snapshot of a hard drive) of a Linux machine here:

https://virtualboximages.com/VirtualBox+Scientific+Linux+Images

https://virtualboximages.com/Scientific+Linux+7+x86_64+Desktop+VirtualBox+Virtual+Computer

and just load it in to Virtual Box. You will then need to install software on that Linux machine
February 5, 2016 at 11:47 in reply to: Tools for running a web-based listening test #2434
Simon
Professor
A simple option is Google Forms: http://www.google.co.uk/forms/about/

but in a bit of a roundabout way: http://screencast-o-matic.com/watch/c2QbINnWz3
February 5, 2016 at 11:37 in reply to: Tools for running a web-based listening test #2433
Simon
Professor
http://surveymonkey.com
February 5, 2016 at 11:37 in reply to: Tools for running a web-based listening test #2432
Simon
Professor
Windows-only software suggested by someone in CSTR

http://www.wondershare.com/pro/quizcreator.html

which creates a test in flash. Not the best choice, but it works fine for people with no knowledge of web pages.
February 5, 2016 at 11:34 in reply to: Local tools for running evaluations #2430
Simon
Professor
If you have an Informatics DICE account then you already have webspace that you can use to run a listening test, using CGI scripting. Using my username of ‘simonk’ as an example, this is the location on the filesystem (from any DICE machine)

/public/homepages/simonk

In there, the directory ‘web’ is served up by the web server at the URL http://homepages.inf.ed.ac.uk/simonk/ and the ‘cgi’ directory is somewhere you can put scripts. So this file

/public/homepages/simonk/web/mypage.html

would have the URL

http://homepages.inf.ed.ac.uk/simonk/mypage.html
February 4, 2016 at 16:02 in reply to: Spontaneous Speech Transcription Strategy #2420
Simon
Professor
Sebastian Andersson, Kallirroi Georgila, David Traum, Matthew Aylett, and Robert Clark. Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In Proc. Speech Prosody, Chicago, USA, May 2010. PDF

Sebastian Andersson, Junichi Yamagishi, and Robert A.J. Clark. Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175-188, 2012. DOI: 10.1016/j.specom.2011.08.001
Author

Posts

Viewing 15 posts - 946 through 960 (of 1,087 total)

← 1 2 3 … 63 64 65 … 71 72 73 →

Simon

Forum Replies Created

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis