Page 26

Forum Replies Created

Viewing 15 posts - 376 through 390 (of 1,087 total)

← 1 2 3 … 25 26 27 … 71 72 73 →

Author

Posts
December 10, 2019 at 10:56 in reply to: Question 24, computation efficiency of Euclidian distance vs Gaussian #10539
Simon
Professor
In the question, we are not being asked about actually doing classification, but just about the computational cost of calculating a distance vs calculating a probability.
December 10, 2019 at 10:53 in reply to: Dec 2010 – Q11, Q24 #10538
Simon
Professor
Correct – the Euclidean distance measure is not learned from the data.
December 10, 2019 at 10:52 in reply to: Dec 2010 – Q11, Q24 #10537
Simon
Professor
Pronunciation model is another name for the word model: a model of words that emits a phoneme sequence.
December 10, 2019 at 10:50 in reply to: Question 24 – tokenization #10536
Simon
Professor
For Speech Processing, we only cover within-sentence tokenisation using simple methods: manually created rules or regular expressions.

Splitting a longer text into sentences is a different problem and probably harder, since it would involve deciding whether periods are end-of-sentence or not, for example. This problem might be hard enough to require machine learning, such as a CART trained on manually segmented text. This is out-of-scope for this course.
December 10, 2019 at 06:38 in reply to: Question 13 – diphones #10523
Simon
Professor
Unit selection, by definition, involves searching amongst many unit sequences, regardless of the unit type.
December 9, 2019 at 21:59 in reply to: TD-PSOLA vs. PSOLA #10503
Simon
Professor
PSOLA can be used on the residual signal during Residual Excited Linear Prediction (RELP). The residual is of course just a waveform, so is in the time domain. But we usually reserve the term TD-PSOLA for application to speech waveforms only.

PSOLA doesn’t apply in the frequency domain.
December 9, 2019 at 16:12 in reply to: Question 13 – diphones #10499
Simon
Professor
Unit selection isn’t mentioned in this question, so the answer should be independent of that.

In all exam questions, read very carefully to see precisely what is being asked (and what is not being asked). A common mistake is to skim a question and assume you know what is being asked.

Specifically, exam questions this year might appear to be similar to past questions that you can remember, but the wording might be changed and so they might be asking a different question (even if the list of available answers is the same).
December 9, 2019 at 08:57 in reply to: Q17 – linear prediction #10494
Simon
Professor
The term “linear prediction” means a source-filter model using a linear-predictive (=all pole) filter in combination with a source (normally a pulse train, for voiced speech).

Option ii. talks about the spectrum, not spectrogram. A linear predictive filter approximates the vocal tract frequency response, and is responsible for imposing the spectral envelope on to the source spectrum. So one way to estimate the spectral envelope from a speech signal is to fit a linear predictive filter and then plot its frequency response.
December 9, 2019 at 08:52 in reply to: Question 5 – Voicing and Air Flow #10493
Simon
Professor
The statement “voicing is caused by air flowing through the glottis” is always true. There is no other way to generate voicing.

But the statement “air flowing through the glottis causes voicing” would not be correct all the time.
December 9, 2019 at 08:47 in reply to: Question 16 – declination #10491
Simon
Professor
Declination is the gradual decline of F0 over an utterance – see Figure 8.12 in J&M 8.3.
December 9, 2019 at 08:38 in reply to: Question 4 — Mistakes in paper, reduce and decrease #10490
Simon
Professor
The frequency range of the spectrogram (its minimum and maximum values), as you nearly said, is determined only by the sampling rate of the waveform (which is not specified in the question, but you assumed to be 16 kHz).
December 9, 2019 at 08:34 in reply to: Question 2 – waveform content #10488
Simon
Professor
Speaking rate is indeed “how fast” the speech is: it’s commonly measured in syllables per second.

The question talks about a single period of the waveform. If you took the spectrum of this single period, what would that look like? Would you see formants? Would you see harmonics?
December 8, 2019 at 21:43 in reply to: Holmes & Holmes – Chapter 8 #10481
Simon
Professor
There are many variants on path weightings. We don’t need to get lost in those details – just aim to understand dynamic programming and its application to aligning two sequences of differing lengths.

The reason for the double weighting on diagonal paths is that the alternative path against which they are being compared involves summing the costs of one horizontal path and one vertical path, so the comparison would be “unfair”.

This problem of correctly weighting paths arises because there is no modelling of duration. In an HMM, there is a (very simple) duration model: the transition probabilities.
December 8, 2019 at 21:39 in reply to: Question 24 – tokenization #10480
Simon
Professor
The question isn’t asking you whether “regular expression counts as one type of handcrafted rule”. You just need to decide which of those four techniques are typically used.
December 8, 2019 at 19:02 in reply to: Question 27 – EM training #10469
Simon
Professor
Correct. Any gradient ascent algorithm such as EM (or backpropagation for training a neural network – out of scope for Speech Processing) can only find a local optimum.

The Baum-Welch algorithm used for training HMMs maximises the likelihood only in the sense that it finds a local maximum: it only guarantees that there is no small change in the model’s parameters that would further increase the likelihood of the training data.
Author

Posts

Viewing 15 posts - 376 through 390 (of 1,087 total)

← 1 2 3 … 25 26 27 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis