Forum Replies Created
-
AuthorPosts
-
December 10, 2019 at 10:57 in reply to: Question 8 — why finite state language module popular #10540
You are right – writing a language model by hand for a “very-large-vocabulary” would be impractical, and it would be impossible to manually set appropriate probabilities on all the transitions.
December 10, 2019 at 10:56 in reply to: Question 24, computation efficiency of Euclidian distance vs Gaussian #10539In the question, we are not being asked about actually doing classification, but just about the computational cost of calculating a distance vs calculating a probability.
Correct – the Euclidean distance measure is not learned from the data.
Pronunciation model is another name for the word model: a model of words that emits a phoneme sequence.
For Speech Processing, we only cover within-sentence tokenisation using simple methods: manually created rules or regular expressions.
Splitting a longer text into sentences is a different problem and probably harder, since it would involve deciding whether periods are end-of-sentence or not, for example. This problem might be hard enough to require machine learning, such as a CART trained on manually segmented text. This is out-of-scope for this course.
Unit selection, by definition, involves searching amongst many unit sequences, regardless of the unit type.
PSOLA can be used on the residual signal during Residual Excited Linear Prediction (RELP). The residual is of course just a waveform, so is in the time domain. But we usually reserve the term TD-PSOLA for application to speech waveforms only.
PSOLA doesn’t apply in the frequency domain.
Unit selection isn’t mentioned in this question, so the answer should be independent of that.
In all exam questions, read very carefully to see precisely what is being asked (and what is not being asked). A common mistake is to skim a question and assume you know what is being asked.
Specifically, exam questions this year might appear to be similar to past questions that you can remember, but the wording might be changed and so they might be asking a different question (even if the list of available answers is the same).
The term “linear prediction” means a source-filter model using a linear-predictive (=all pole) filter in combination with a source (normally a pulse train, for voiced speech).
Option ii. talks about the spectrum, not spectrogram. A linear predictive filter approximates the vocal tract frequency response, and is responsible for imposing the spectral envelope on to the source spectrum. So one way to estimate the spectral envelope from a speech signal is to fit a linear predictive filter and then plot its frequency response.
The statement “voicing is caused by air flowing through the glottis” is always true. There is no other way to generate voicing.
But the statement “air flowing through the glottis causes voicing” would not be correct all the time.
Declination is the gradual decline of F0 over an utterance – see Figure 8.12 in J&M 8.3.
The frequency range of the spectrogram (its minimum and maximum values), as you nearly said, is determined only by the sampling rate of the waveform (which is not specified in the question, but you assumed to be 16 kHz).
Speaking rate is indeed “how fast” the speech is: it’s commonly measured in syllables per second.
The question talks about a single period of the waveform. If you took the spectrum of this single period, what would that look like? Would you see formants? Would you see harmonics?
There are many variants on path weightings. We don’t need to get lost in those details – just aim to understand dynamic programming and its application to aligning two sequences of differing lengths.
The reason for the double weighting on diagonal paths is that the alternative path against which they are being compared involves summing the costs of one horizontal path and one vertical path, so the comparison would be “unfair”.
This problem of correctly weighting paths arises because there is no modelling of duration. In an HMM, there is a (very simple) duration model: the transition probabilities.
The question isn’t asking you whether “regular expression counts as one type of handcrafted rule”. You just need to decide which of those four techniques are typically used.
-
AuthorPosts