Page 25

Forum Replies Created

Viewing 15 posts - 361 through 375 (of 1,087 total)

← 1 2 3 … 24 25 26 … 71 72 73 →

Author

Posts
April 2, 2020 at 09:33 in reply to: Demographics #11083
Simon
Professor
The listener is potentially a significant factor that could affect the results. Language background is one important aspect of this, and for that reason, the work reported in many scientific papers only uses native listeners.

Within speech synthesis evaluation, there is not a lot of work exploring this. One paper worth skimming is

Wester, Valentini-Botinhao and Henter “Are we using enough listeners? no! — an empirically-supported critique of interspeech 2014 TTS evaluations“, In INTERSPEECH-2015, 3476-3480.

although this is focussed on number of listeners more than their properties.

From the Blizzard Challenge we know that non-natives have systematically higher average (and higher standard deviation) WER than native speakers when transcribing speech. We sometimes also find that people with high exposure to TTS (“Speech Experts”) do not rank systems in exactly the same order as the general population of listeners.

If we think that listener properties could affect results, then there are several possible approaches, including:

1. use a listener pool that is as homogenous as possible, typically “normal-hearing native speakers” – this is what we do in Edinburgh most of the time

2. use a large listener pool and collect information about, for example, language background or previous exposure to TTS, so that the results can be analysed – this is what Blizzard does

Neither is perfect. 1 – limits the available pool of subjects. 2 – results in unbalanced sub-groups which complicates statistical analysis.

For the assignment, I do not recommend attempting to restrict your listeners to only native speakers of English, or to native speakers of your own first language (where that’s not English) – just get as many listeners as possible. So, take approach 2.

Approach 2 involves collecting information about each individual listener. Be very careful to collect only what is essential for testing your hypothesis (e.g., language background) and not to ask for intrusive personal information that you don’t need (e.g., gender, age, ethnicity).

But, for the assignment, investigation of listener factors is optional and it would be fine to omit this and to analyse system properties instead. You choose!
March 29, 2020 at 19:31 in reply to: Festival Encoding #11053
Simon
Professor
Festival is showing its age: it doesn’t support UTF-8. It only supports ASCII.
March 26, 2020 at 14:44 in reply to: Ranking Tasks #11031
Simon
Professor
Terminology in this reply:
- sentence = the text being synthesised
- utterance = the synthetic speech for a sentence
You’re right to worry about listener boredom or fatigue.

But you also need to think about the effect of the sentence, which can be large (or at least unknown): some sentences are just harder to synthesise than others. In general, we therefore prioritise using the same sentences across all systems we are comparing.

If you place utterances side-by-side for direct comparison (e.g., ranking or MUSHRA) then you would always use the same sentence. Listeners would indeed have to listen to the same sentence uttered multiple times.

If you present utterances one at a time (e.g., MOS) then you can (pseudo)randomise the order so that listeners do not get the same sentence several times in a row, although they will still hear multiple utterances saying the same sentence across the test (or that section) as a whole.
March 24, 2020 at 22:33 in reply to: Working on a remote machine #11029
Simon
Professor
Here’s a way to ssh via a gateway machine (outside the University firewall) to a machine inside the firewall, in a single line. Does not require the VPN:
```
$ ssh -t s1234567@student.ssh.inf.ed.ac.uk -t ssh s1234567@ppls-atl-0020.ppls.ed.ac.uk
Password: 
s1234567@ppls-atl-0020.ppls.ed.ac.uk's password: 
```
the first password request is for student.ssh.inf.ed.ac.uk, the second for ppls-atl-0020.ppls.ed.ac.uk.

Setting up ssh keys appropriately should allow you to do this without passwords, except Informatics don’t allow ssh keys, so you need to use Kerberos – see their support pages.

This part does work though, to avoid needing a password for the lab computer: generate keys on student.ssh.inf.ed.ac.uk and copy to ppls-atl-0020.ppls.ed.ac.uk using ssh-copy-id .

The error “ssh_exchange_identification: read: Connection reset by peer” usually means you had a few failed login attempts in a short period of time. Wait and try later.
March 12, 2020 at 08:49 in reply to: Bulk processing of text in Festival #10715
Simon
Professor
Please include the complete command line you are running, and the full error message, so I can help you.
March 7, 2020 at 17:26 in reply to: Resource for semantically unpredictable sentences? #10709
Simon
Professor
Semantically Unpredictable Sentences (SUS) follow a simple template format, given in the paper along with links to word lists. From these, a simple script can be written to randomly-generate SUS.

Remember that SUS may not be necessary if you don’t have a ceiling effect on intelligibility – you will want to informally find that out before proceeding with SUS. Using SUS with a very low-intelligibility voice might lead to a floor effect!

Harvard sentences are semantically plausible and (supposedly) phonetically-balanced when used in groups of 10. They are still widely used for intelligibility testing when there is no risk of ceiling effect, such as in noise (or, in the case of this assignment, when the synthetic voice is far from perfect!).
February 28, 2020 at 08:16 in reply to: Materials not covered during strikes #10689
Simon
Professor
I would expect students to continue studying even when there are no classes. Therefore, yes, I would expect you to have worked through all materials according the originally-planned class schedule.

What we actually cover in each remaining class may be adjusted to make best use of the available class time (but without scheduling additional hours to replace cancelled classes).

It’s too early for me to make an announcement about what the effect on the exam might be. I have not yet written the exam.
February 19, 2020 at 10:02 in reply to: Festival inserting extra lines when running bulk processing script #10675
Simon
Professor
There might be some non-ASCII (and non-printing – therefore hard to detect) characters in a few sentences. Here’s one way to remove all non-ASCII characters
```
cat input.txt | iconv -c -t ASCII > output.txt
```
Or you could simply manually remove those sentences that get split across two lines by Festival.
February 16, 2020 at 14:24 in reply to: Taylor – Chapter 16 #10658
Simon
Professor
F0 is real-valued. Taylor argues that this means there is a very natural way to measure the distance between two F0 values. For example, we could take their difference. I would make this argument on the basis of perception: it is clear that a larger difference in F0 values will generally produce a larger perceived difference in two speech sounds. The relationship is not linear, but at least it is monotonic.

This is in contrast to using multiple high-level features such as stress, accentuation, phrasing and phonetic identity. It is not at all clear what distance metric we should use here, for reasons including:
- they are not real-valued
- we don’t know their relative importance
- we don’t know if/how they are correlated with one another
- the relationship with perception is not so obvious as for F0
February 1, 2020 at 06:22 in reply to: Can't make mfcc list #10639
Simon
Professor
make_mfcc_list uses utts.data as its source of filenames, so perhaps you have modified that?
January 24, 2020 at 07:59 in reply to: dealing with words that are not in the dictionary #10625
Simon
Professor
Weijia W: Adding a word to the script like that is a very good technique – this is exactly the right line of thinking to explore what is happening in each step of voice building.

Bingzi Y: you are right – “prprprfg” wasn’t a good choice of “word” because this would have been classified by Festival as a NSW and expanded into something else (perhaps treated as a LSEQ?). Your “Moschops” is a better choice because this is clearly a possible word in English (in fact, it happens to be a real word in this case).
December 11, 2019 at 09:54 in reply to: Pruning #10553
Simon
Professor
You need to carefully distinguish two very different ways in which we save computation in both DTW and the Viterbi algorithm for HMMs.

Dynamic Programming: this algorithm efficiently evaluates all possible paths (= state sequences for HMMs). All paths are evaluated, and none are disregarded. This algorithm is exact and introduces no errors compared to a naive exhaustive search of the paths one at a time.

Pruning: this involves not exploring some paths (state sequences) at all. In DTW, this means that we will not visit every single point in the grid. In the Viterbi algorithm for HMMs implemented as token passing, it means that not all states will have tokens at all time steps. Pruning introduces errors whenever an unexplored part of the grid would have been on the globally most likely path.

In Dynamic Programming, we talk about “throwing away” all but the locally best path when two or more paths meet. The paths that are “thrown away” have already been evaluated up to that point. Extending those paths further would involve exactly the same computations as extending the best path. So we are effectively still evaluating all paths. We save computation without introducing any error: that’s the magic of Dynamic Programming.

This is not the same as pruning, in which we stop exploring some of the locally best paths, because there is another path (into another point on the DTW grid, or arriving at a different state in the HMM) that is much better.
December 10, 2019 at 18:22 in reply to: Q13- Pruning #10544
Simon
Professor
Your first explanation of search space is correct.
December 10, 2019 at 13:56 in reply to: Token passing #10543
Simon
Professor
Token passing is an algorithm, not a model.

Tokens generate the given observations and – in doing so – we compute the probability of that observation being generated from the state’s pdf. Yes, we just “look up” that probability.
December 10, 2019 at 10:57 in reply to: Question 8 — why finite state language module popular #10540
Simon
Professor
You are right – writing a language model by hand for a “very-large-vocabulary” would be impractical, and it would be impossible to manually set appropriate probabilities on all the transitions.
Author

Posts

Viewing 15 posts - 361 through 375 (of 1,087 total)

← 1 2 3 … 24 25 26 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis