Forum Replies Created
-
AuthorPosts
-
I noticed that too.
It seems like it is related to how the pruning stretegy is designed and implemented as show in this PrunedDTW paper.
Yes, almost all speech sounds need some air flow and the air has to pass through the glottis, but that doesn’t make them all voiced sound.
My logic was if “regex does count as one type of handcrafted rules” then ii would be right. If not, I don’t remember we’ve talked about use it for tokenization, then I would not select ii.
Vocing is when air flowing through the glottis and also makes vocal fold vibrate.
And we call the vibration of the vocal fold voicing.
When making fricative sounds, the vocal fold is not vibrating, therefore it is unvoiced.
Yes, I’m clear with that definition. I think you are implying a is the right choice.
I guess I misunderstood the use of “mean value”, and thought there is only one mean value of F0 in an utterance…
Sorry for the confusions, I was just not sure if my understanding of not including iv. was right.
So should the option “i. is guaranteed to find a solution which maximises the likelihood of the acoustic data given the model for the training set” be excluded then, since the solution can be a local maximum?
Sorry for editing the reply. I suddenly realized the non-zero transitions are updated. And I had even written “they are updated” in the assignment…
I think the answer should be a.
I would hope the option iii. was phrased more precisely.
I don’t think a higher F0 modification with TD-PSOLA is IMPOSSIBLE, it’s just undesirable from an endineering point of view.
How are pseudo pitch marks in unvoiced regions decided? Do we use a universal value for all the unvoiced area?
Then why does a longer vocal tract results in lower formants instead of higher formants.
Following the reply above, why can’t/don’t they reshape their vocal track and adjust their F0 so that the speech they produce has the same formants?
Besides, how much lower is “a little lower” as in “the formants in male speech will be a little lower than in female speech”?
I wonder if this is still useable this year? Can we do our experiments remotely?
Yichao
Hi,
I got a question regarding how the first 12 MFCC coefficients are extracted from any cepstrum. J&M says in Ch.9.3.5 that “we generally just take the first 12 cepstral values”. Does that mean we are taking the y-axis value for the first 12 values on the x-axis, or we are somehow extracting some features from the cepstrum?
Many thanks,
Yichao
-
AuthorPosts