Forum Replies Created
-
AuthorPosts
-
i would think maybe the fact that they model the perceptual characteristics of speech by capturing more information at lower frequency ranges is one reason, and the fact that they help to remove the excitatory signal of pitch from the featurs after we perform ceostrum smoothing (cutting off the higher order ceostral index features). so they more precisely model the salient perceptual characteristics of phones?
although i am not sure and would like to hear more, also i dont really lnow the benefits of using the FFT coefficients at all – what do you think they are?
So Baum welch computes all possible State sequences through the model and weights the joint probability of each (by counting how many times a state aligned with an observation) and thus returns a full estimate of the observations being produced from any state sequence producible by the model
a) a TTS system comprises the following steps: 1. tokenisation and sentence splitting, 2. NSW tagging, 3. POS tagging, 4. NSW expansion, 5. Phrase break prediction, 6. Syllabification (is this letter to sound/G2P?), 7. Stress prediction, 7. Diphone selection, 8. Waveform generation
step 1. takes the raw text as input and splits it up into sentences, usually based on hand-written rules that split on punctuation, it then splits it further into tokens – typically by splitting on whitespace. it outputs word tokens from the raw text.
step 2.
…JUST TO CHECK – IS THIS THE STYLE OF RESPONSE EXPECTED OR ITS BAD? COULD YOU SUGGEST WHAT IS GOOD? AND I WILL CONTINUE
always the same thing but i have no clue how to turn it off or why it is happening
Attachments:
You must be logged in to view attached files.How can I copy files from my virtual machine to my local machine? I am using Lubuntu 18 OS.
are we supposed to work on local pc or the vmware – where do we install the HTK/WaveSurfer applications?
thank-you
i found it here for some reason
/Volumes/Network/courses/sp/digit_recogniser
with the structure as so
[atlab@localhost digit_recogniser]$ ls lab mfcc models rec resources scripts wav
hope this is it
and what notation is festival using for transcribing i.e. (k@nten?) etc. because otherwise i dont know how to check online and compare to the proper pronunciation of a word. also i am not sure where to find the proper pronunciation of the word transcribed, i can listen to audio on most dictionary sites but cant see one with a transcription in ipa or somethng like that.
is
k
the number of cycles of the circle we do in a second, and so hertz would just bek
…? sorry i’m finding it quite confusingI believe that:
k
is indexing the component frequencies we are running DFT on to get their coefficients of correlation with the complex waveform we are trying to deconstruct. We iterate fromk
toK
whereK
is the nyquist frequency.To plot
X[k]
we place frequencies from k to K on the horizontal axis and label it frequencies, and then we plot the magnitude of that frequency given X[k] on the vertical axis, and we have drawn our frequency spectrum…..I hope i have got it?
when doing cos(theta) = adj/hyp; or tan(theta) = opp/adj; and manipulating this to get the theta how do we know it gives us the theta we are interested and not the other angle of the triangle. for eg. how we know it gives us the one parallel to the right angle from the x-axis centre and not the one above the right angle from the x-axis centre?
And how do you ensure that the sampling rate is capturing points at the peaks and troughs so that you can rebuild the waveform correctly – would this not be hard to do for complex sine waves, or even for pure sine tones if it didn’t match the phase of the sine waves peaks and troughs…?
-
AuthorPosts