Page 36

Forum Replies Created

Viewing 15 posts - 526 through 540 (of 1,084 total)

← 1 2 3 … 35 36 37 … 71 72 73 →

Author

Posts
February 27, 2018 at 14:06 in reply to: Checking Phoneme coverage #9084
Simon
Professor
With only 500 sentences, you could record them all (assuming this comes to about the same size as Arctic A, so you could make a comparison). So, text selection would not be necessary. However, you should still measure the coverage of your text, and compare that to Arctic A (measured by you in the same way).

You need to decide on what measure(s) of coverage to use, and justify these in your report. Phonemes would be one possibility (a missing phoneme implies a large number of missing diphones – this would be bad), diphones would another. For these, you can measure the number of types covered, and express this as a percentage of the theoretical total number of types possible. Read the technical report on Arctic for more details.

But you can think of additional measures too, such as the number of questions vs statements (if that’s relevant to your domain), coverage of domain-specific vocabulary, diphones-in-context, prosodic coverage, etc. Be creative! It’s fine to report multiple measures, provided that you justify each one, and perhaps also give a critique of which you think is most useful.

Note: in general, I’d expect most postgrad students to find a large corpus and then write a text selection algorithm, whilst undergrads can select the text in an informal or manual way.
February 27, 2018 at 11:42 in reply to: Speech Synthesis Module 6: Spectral Envelope #9081
Simon
Professor
The Speech Signal Modelling video now has subtitles and a transcript.

Let me know if this helps, and post follow-up questions here.
February 25, 2018 at 10:47 in reply to: Overfitting a GMM #9075
Simon
Professor
Correct – the variance would be zero. That’s a serious problem, because such a model will assign zero probability mass to everywhere except the exact positions of the data points. Zero variance is also numerically impossible: we cannot compute with such a model.

But overfitting will probably occur long before we get to the point where there are as many mixture components as there are data points. It will happen as soon as the model starts to assign too much probability mass in the small regions around the observed data points and not enough mass to as-yet-unseen values that may occur in the test set.

The problems of small (including zero) variances in a model can be mitigated by setting a variance floor (e.g., not allowing the variance of any mixture component to go below 1% of the variance of the data as a whole). Using a variance floor is good practice because it avoids the numerical problems of very small (or zero) variances, and offers a partial solution to overfitting.
February 24, 2018 at 20:55 in reply to: 'The' diphone problem #9073
Simon
Professor
For reference, here are the standard backoff rules for the Edinburgh accent. There is no rule to allow ii to back off to schwa (@) – I’m not sure why that is (but Korin will).
```
(set! unilex-edi-backoff_rules
'(
(l! l)
(n! n)
(eir e)
(iii ii)
(n @)
(aa @)
(ae @)
(i @)
(irr @)
(iii @)
(ei @)
(er @)
(a @)
(eir @)
(uw @)
(@@r @)
(e @)
(oo @)
(our @)
(ow @)
(o @)
(uh @)
(u @)
(urr @)
(uuu @)
(i@ @)
(ur @)
(hw w )
(s z)
(_ #)
))
```
February 24, 2018 at 18:55 in reply to: Overfitting a GMM #9072
Simon
Professor
If there as many Gaussian mixture components in a GMM as there are data points, then we would expect each component to model a single data point. The mean of each component would be equal to the value of the corresponding data point.

What will the variance of each mixture component be?
February 24, 2018 at 18:51 in reply to: 'The' diphone problem #9071
Simon
Professor
You’re right that Festival varies the pronunciation of “the” in this way. In principle, you can modify the back-off rules so that when dh_ii is not found then dh_@ is selected. Making such modifications is a little beyond the scope of this assignment, but if you really want to try, then look up the function

du_voice.setDiphoneBackoff

and ask Korin (who wrote this part of Festival) for help 🙂
February 15, 2018 at 13:44 in reply to: higher sample rate #9062
Simon
Professor
It’s a reasonable idea: higher sampling rates will sound better. But, unfortunately you are limited to 16kHz for this assignment.
February 2, 2018 at 16:50 in reply to: Cannot execute binary file #9023
Simon
Professor
Whenever you see the error message “cannot execute binary file” referring to a file that should never be executed, you need to look for a place where that file is the first thing on a line in the shell script.

In your case, this is within the backticks on line 11. Backticks create a new shell and the contents are passed to this new shell to be executed. In your case, the contents will be a list of wav files.

You need either:
```
for F in `ls ${RECDIR}/*.wav`
```
or
```
for F in ${RECDIR}/*.wav
```
or the fancier
```
for F in `find ${RECDIR} -name "*.wav"`
```
December 25, 2017 at 03:49 in reply to: Sound source of voiced fricatives #8866
Simon
Professor
It’s fine to say that a constriction in the vocal tract is a source of sound, in the same way that we say the vocal folds are a source of sound.

Neither of these parts of the anatomy actually creates sound itself. They do so by changing the airflow. The vocal folds interrupt the airflow periodically. A constriction (if narrow enough) will create turbulent airflow.

If we were being super-strict with the wording, perhaps we might say that these are the locations of sound sources.
December 25, 2017 at 01:04 in reply to: Sound source of voiced fricatives #8864
Simon
Professor
Voiced fricatives have two sound sources. The clue is in the name:

voiced = the vocal folds are vibrating.

fricative = there is turbulent airflow caused by a constriction somewhere in the vocal tract.

If we want to synthesise such a sound using a vocoder, we will need what is called “mixed excitation“, in other words, a mixture of periodic and aperiodic sources. Some very simple vocoders cannot do this, because they switch between the two sources and can’t mix them together.
December 19, 2017 at 02:26 in reply to: Question 28 – Viterbi vs EM training #8838
Simon
Professor
Correct. EM does not guarantee “to find the maximum likelihood parameter settings given the training data” – it can only increase (or at least not decrease) the likelihood at each iteration, stopping when it reaches a local maximum.
December 19, 2017 at 01:55 in reply to: Complexity of using Euclidean distance #8831
Simon
Professor
The Euclidean distance metric is effectively the same as a Gaussian with a constant variance.
December 19, 2017 at 01:37 in reply to: Question 28 – Viterbi vs EM training #8821
Simon
Professor
Even EM offers no guarantee to find the model parameters that maximise the likelihood of the training data (stated as “the maximum likelihood parameter settings” in the question). It can only find a local maximum, for the reasons explained in this topic. So, b. is untrue.
December 19, 2017 at 01:05 in reply to: Question 28 – Viterbi vs EM training #8813
Simon
Professor
Yes, iv. is not true – how could it be, when we might not know anything about the test set whilst training the model?

Let’s consider the other options:

i. says that EM will find the best possible model. But we also know that it’s just an iterative “hill climbing” algorithm that stops when it cannot climb any higher. (“Height” means likelihood of the training data.). EM is also sensitive to the starting position: we may get a different final model, if we start from a different initial model. These two facts tell us that it cannot guarantee to maximise the likelihood of the training data – the best it can do is find a local maximum.

ii. we’ve already seen the this is true: hill-climbing will never take us downhill

iii.this is true by definition: EM updates all model parameters in each M step

This leads us to the correct answer of c.
December 19, 2017 at 00:56 in reply to: Question 24 – probability of an observation #8812
Simon
Professor
You’re correct to rule out c. – this would make dynamic programming inapplicable. The same goes for d., which even requires knowledge of the future state sequence!

If b. was true, then what will happen when two tokens (in Token Passing) meet in a particular state? What if they had differing previous states (which of course they always will)?

The correct answer is a. – this is in fact a statement of the Markov property of the model.
Author

Posts

Viewing 15 posts - 526 through 540 (of 1,084 total)

← 1 2 3 … 35 36 37 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis