Page 48

Forum Replies Created

Viewing 15 posts - 706 through 720 (of 1,084 total)

← 1 2 3 … 47 48 49 … 71 72 73 →

Author

Posts
October 24, 2016 at 08:39 in reply to: Which word processor do you recommend? #5570
Simon
Professor
It is possible to get really excellent results from Microsoft Word, if you are an expert user and know how to control all the settings. However, in my experience very few people manage that. The default output from Word looks ugly. It is poor at typesetting equations. For long documents, it becomes unreliable.

For these reasons, I recommend learning (and mastering) LaTeX. It is a little harder to learn than Word, but its default output is better. I don’t recommend learning it just to write the coursework for this course; but, if you need to write a dissertation later in your programme, this is a better tool than Word.
October 23, 2016 at 18:43 in reply to: Citing the manual for some software #5565
Simon
Professor
This is a case where citing the online version is the correct thing to do. Cite it as you would any other URL (e.g., mention the date on which you last accessed it).
October 23, 2016 at 18:41 in reply to: Syllable structure & stress #5563
Simon
Professor
For the voice used in this assignment, this is done by rules hardwired into the low-level C++ code, which are specific to the Unilex dictionary.

(You are not expected to be able to read or understand the code, but feel free to try).

EDIT – see below for a more detailed answer explaining what the rules do.
October 23, 2016 at 18:34 in reply to: Category questions #5560
Simon
Professor
These questions can be very useful, because a single split can give a large reduction in entropy, and we end up with a smaller tree than if we had to ask each individual question in sequence.

Including category questions is very common. It’s a kind of feature engineering because it’s exactly equivalent to adding a new 2-valued predictor to every data point.

This is a good way to include domain knowledge or our own intuitions about the problem.

Related example: In HMM-based automatic speech recognition, regression trees are used to cluster the parameters of context-dependent models. Category questions are used as standard.
October 23, 2016 at 18:25 in reply to: information gain #5558
Simon
Professor
When we partition some data points using a binary question, we hope to make the distribution of values of the predictee less uniform and more predictable. In other words, we try to reduce the entropy of the probability distribution of the predictee.

If we manage to do that, we have gained some information about the value of the predictee. We know more about it (= we are more certain of its value) after the split than before it.

The reduction in entropy from before to after the split is the information gain, measured in bits.

[other part of question answered separately – please include only one question per post]
October 23, 2016 at 18:20 in reply to: Linear time-invariant (LTI) system and impulse response #5557
Simon
Professor
Yes, LPC uses a simple linear filter that is time-invariant.
October 23, 2016 at 18:07 in reply to: Diacritics in Festival #5553
Simon
Professor
You should assume that Festival only accepts plain ASCII characters and cannot interpret characters with accents / diacritics.
October 20, 2016 at 13:18 in reply to: Linear prediction #5531
Simon
Professor
This was done in main lecture 5 of Speech Processing.
October 20, 2016 at 13:14 in reply to: should I reference something I have taken in #5529
Simon
Professor
It’s a matter of degree, without a right/wrong answer. You need to find a nice balance between citing support for each claim or fact, but without the density of citations making the text unreadable.

In your example, a citation is not essential at the end of that sentence, but you will want to provide citations once you start describing the individual processes.
October 20, 2016 at 13:11 in reply to: LPC filter coefficients #5528
Simon
Professor
Do we calculate the filter coefficients as if they were the filter producing the speech we’re trying to synthesise?

Yes, that’s correct – in effect, we fit the filter to the spectral envelope of the speech.

Would inverse filtering the recorded speech result in a pulse train as the excitation signal?

No. The filter is simple and cannot model speech perfectly. The error in this modelling is captured in the residual signal. The residual is a waveform. For voiced speech, it will be more similar to a pulse train than the speech was, but not exactly a pulse train.
October 19, 2016 at 14:44 in reply to: boundary of diphone #5510
Simon
Professor
We have joins in consonants too. So in the example /k ae t/, the diphones would be

sil_k k_ae ae_t t_sil

where sil is “silence” and is just another phoneme.

You correctly spot that we might not want the place the cut point at exactly the centre (50% point) in all cases. In the case of stops, we will make the join in the closure portion.
October 18, 2016 at 18:32 in reply to: Pulse trains #5503
Simon
Professor
Yes, a pulse train is an approximation to the sound produced by vocal fold vibration.

Although it might not seem like a particularly good approximation, it is simple and mathematically convenient. The principal difference between a pulse train and the true signal, is that the pulse train has a flat spectral envelope.

That’s not a problem though: we can include the modelling of the actual spectral envelope of the vocal fold signal in the vocal tract filter.

In other respects, the pulse train has the correct properties: specifically, it has energy at every multiple of F0 (a “comb-like” or “line” spectrum).

So, we can say that a source-filter model is really a model of the signal, and not a literal model of the physics of speech production.
October 17, 2016 at 09:39 in reply to: Worked example 2 – phrase breaks #5487
Simon
Professor
Let’s follow your working:

I label the predictors as their Parts-Of-Speech

Correct – but you should say that you annotate the training data samples with values for each of the three predictors. In this example, we are using the POS of the preceding, current and following word as the predictors 1, 2 and 3 respectively.

use the question Is the label after “BREAK” PUNC?

Let’s word that more carefully. Questions must be about predictors of the current data point. So you should say:

Ask the question Is predictor 3 = PUNC?

Now partition the data accordingly.

everything which is punctuation comes after a BREAK, and everything which isn’t punctuation is a conjunction

This is where you’ve made the mistake. For question Is predictor 3 = PUNC?, 8 data points have the answer “Yes” and all of them have the value “NO-BREAK” for the predictee, which indeed is a distribution with zero entropy. So far, so good.

Now look at the 26 data points for which the answer to Is predictor 3 = PUNC? was “No”. The distribution of predictee values is 4 BREAKs and 22 NO-BREAKs. That distribution does not have zero entropy.

Your reasoning about “everything which isn’t punctuation is a conjunction” is incorrect. You are looking at the distribution of values of a predictor. When measuring entropy, we look only at the value of the predictee. That is the thing we are trying to predict. Entropy is measuring how much more predictable it has become after a particular split of the data.
October 17, 2016 at 09:13 in reply to: Moving the SP Quizzes to before the lecture #5484
Simon
Professor
I’ve tried in the past setting questions as homework or lecture preparation, but only a few students took part. We can try again, perhaps in the speech recognition part of the Speech Processing course.
October 16, 2016 at 20:40 in reply to: Jurafsky & Martin – Chapter 8 #5482
Simon
Professor
Pitch marking means finding the instants of glottal closure. Pitch marks are moments in time. The interval of time between two pitch marks is the pitch period, denoted as T0, which of course is equal to 1 / F0.

You might think that pitch marking would be the best way to find F0. However, it’s actually not, because pitch marking is hard to do accurately and will give a lot of local error in the estimate for F0.

Pitch marking useful for signal processing, such as TD-PSOLA.

Pitch tracking is a procedure to find the value of F0, as it varies over time.

Pitch tracking is done over longer windows (i.e., multiple pitch periods) to get better accuracy, and can take advantage of the continuity of F0 in order to get a more robust and error-free estimate of its value.

Pitch tracking useful for visualising F0, analysing intonation, and building models of it.

Exactly how pitch marking and pitch tracking work is beyond the scope of the Speech Processing course, but is covered in the more advanced Speech Synthesis course.
Author

Posts

Viewing 15 posts - 706 through 720 (of 1,084 total)

← 1 2 3 … 47 48 49 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis