Forum Replies Created

Viewing 13 posts - 1 through 13 (of 13 total)

Author

Posts
December 13, 2020 at 20:17 in reply to: Part 2 Q4B #13634
Vishnu M
Student
i would think maybe the fact that they model the perceptual characteristics of speech by capturing more information at lower frequency ranges is one reason, and the fact that they help to remove the excitatory signal of pitch from the featurs after we perform ceostrum smoothing (cutting off the higher order ceostral index features). so they more precisely model the salient perceptual characteristics of phones?

although i am not sure and would like to hear more, also i dont really lnow the benefits of using the FFT coefficients at all – what do you think they are?
December 13, 2020 at 16:27 in reply to: Baum-welch algorithm #13622
Vishnu M
Student
So Baum welch computes all possible State sequences through the model and weights the joint probability of each (by counting how many times a state aligned with an observation) and thus returns a full estimate of the observations being produced from any state sequence producible by the model
December 12, 2020 at 23:25 in reply to: Part II Question 2 #13587
Vishnu M
Student
a) a TTS system comprises the following steps: 1. tokenisation and sentence splitting, 2. NSW tagging, 3. POS tagging, 4. NSW expansion, 5. Phrase break prediction, 6. Syllabification (is this letter to sound/G2P?), 7. Stress prediction, 7. Diphone selection, 8. Waveform generation

step 1. takes the raw text as input and splits it up into sentences, usually based on hand-written rules that split on punctuation, it then splits it further into tokens – typically by splitting on whitespace. it outputs word tokens from the raw text.

step 2.

…JUST TO CHECK – IS THIS THE STYLE OF RESPONSE EXPECTED OR ITS BAD? COULD YOU SUGGEST WHAT IS GOOD? AND I WILL CONTINUE
December 5, 2020 at 22:35 in reply to: Prompt BUG #13452
Vishnu M
Student
always the same thing but i have no clue how to turn it off or why it is happening

Attachments:
You must be logged in to view attached files.
November 14, 2020 at 19:06 in reply to: CW2 Files #13058
Vishnu M
Student
How can I copy files from my virtual machine to my local machine? I am using Lubuntu 18 OS.
November 7, 2020 at 16:40 in reply to: CW2 Files #13010
Vishnu M
Student
are we supposed to work on local pc or the vmware – where do we install the HTK/WaveSurfer applications?
November 7, 2020 at 16:23 in reply to: Note Tips – Slide Making #13005
Vishnu M
Student
thank-you
November 7, 2020 at 16:21 in reply to: CW2 Files #12998
Vishnu M
Student
i found it here for some reason
/Volumes/Network/courses/sp/digit_recogniser

with the structure as so
```
[atlab@localhost digit_recogniser]$ ls
lab  mfcc  models  rec  resources  scripts  wav
```
hope this is it
October 30, 2020 at 09:22 in reply to: Festival's Lexicon #12824
Vishnu M
Student
and what notation is festival using for transcribing i.e. (k@nten?) etc. because otherwise i dont know how to check online and compare to the proper pronunciation of a word. also i am not sure where to find the proper pronunciation of the word transcribed, i can listen to audio on most dictionary sites but cant see one with a transcription in ipa or somethng like that.
October 23, 2020 at 14:31 in reply to: Explaining DFT Formula #12672
Vishnu M
Student
is k the number of cycles of the circle we do in a second, and so hertz would just be k…? sorry i’m finding it quite confusing
October 12, 2020 at 19:35 in reply to: Explaining DFT Formula #12358
Vishnu M
Student
I believe that:

k is indexing the component frequencies we are running DFT on to get their coefficients of correlation with the complex waveform we are trying to deconstruct. We iterate from k to K where K is the nyquist frequency.

To plot X[k] we place frequencies from k to K on the horizontal axis and label it frequencies, and then we plot the magnitude of that frequency given X[k] on the vertical axis, and we have drawn our frequency spectrum…..

I hope i have got it?
October 4, 2020 at 11:24 in reply to: Complex Numbers Tutorial #12187
Vishnu M
Student
when doing cos(theta) = adj/hyp; or tan(theta) = opp/adj; and manipulating this to get the theta how do we know it gives us the theta we are interested and not the other angle of the triangle. for eg. how we know it gives us the one parallel to the right angle from the x-axis centre and not the one above the right angle from the x-axis centre?
October 4, 2020 at 10:23 in reply to: Sampling and Nyquist Frequency #12186
Vishnu M
Student
And how do you ensure that the sampling rate is capturing points at the peaks and troughs so that you can rebuild the waveform correctly – would this not be hard to do for complex sine waves, or even for pure sine tones if it didn’t match the phase of the sine waves peaks and troughs…?
Author

Posts

Viewing 13 posts - 1 through 13 (of 13 total)

Vishnu M

Forum Replies Created

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis