Forum Replies Created

Viewing 8 posts - 1 through 8 (of 8 total)

Author

Posts
April 6, 2016 at 09:41 in reply to: Bad Pitch Marking #2971
Qianchu L
Student
Now I see. It turns out that I missed the step of building join cost. Just to clarify, pitch marking is used to find the join locations which are probably at the pitch marks? When joining two diphone units, we will join the two windowed pitch periods at the join point. Right? Thanks!
April 4, 2016 at 10:35 in reply to: Bad Pitch Marking #2959
Qianchu L
Student
1. I also measured the number of joins. It turns out that the system with male pm setting also has fewer joins (453) than the system with female pm setting (467)? I thought changing pm setting won’t result in that different units choice. Is it because target cost penalizes bad pitch marking

2. If I turn off the target cost, all the target costs will be zero when check the utterance relations right? Should I compare the join cost then?

Thank you!
April 3, 2016 at 21:30 in reply to: Bad Pitch Marking #2950
Qianchu L
Student
Can I ask a follow-up question? I calculated the mean target cost and join cost my system with male pm setting produces for 30 sentences. It turns out that the system with male pm setting gives much lower target and join cost, but the system does produce twice as many pitch marking errors compared to my system with female pm setting. Also, the system doesn’t sound any better than my standard one. I’m just wondering is there a reason for my weird results?
March 8, 2016 at 21:44 in reply to: Deltas #2716
Qianchu L
Student
Now I see!

So the reason why the deltas are produced from a Guassian distribution is because we take into account all the frames produced by the state and produce the mean delta. Right? (I assume that we will also consider examples from all the clustered states under this leaf node as well. Is it correct?)

Thank you!
November 19, 2015 at 22:47 in reply to: HMM training #694
Qianchu L
Student
Thanks! It is all clear to me now. Just a follow-up question:

Can the model itself find the most appropriate number of states? Or is it by convention predetermined to be three states for each phoneme?

Intuitively, I guess there is a limit number of states in each model ie. not larger than the number of frames in the observation sequence generated by the model. Is it correct?

Thanks!
November 13, 2015 at 09:37 in reply to: covariance parameters #612
Qianchu L
Student
This is now clear to me. Thank you so much. I can see the maths in IE but not in Chrome or Firefox.
October 14, 2015 at 18:04 in reply to: Letter-to-sound: Alignment method for the training set #319
Qianchu L
Student
Now that the training set is already labelled with the pronunciation, I assume that every letter is already aligned with its correct phone in each word in the training set, so why are we bothered to implement this algorithm to realign each letter with its phone ?
October 13, 2015 at 11:04 in reply to: Letter-to-sound: Alignment method for the training set #288
Qianchu L
Student
Are the words in the training set already hand labelled with their pronunciation before the algorithm?

If not, how can we find a single good alignment for each word in the training set? If We are to use unigram probabilities, say we count all the possible realisation of “c” in its allowable list (/k/, /s/…) and conclude that P(/k/|”c”) is the highest among the list. With this probability, how are we able to align “c” with /s/ in the case of “cistern”?
Author

Posts

Viewing 8 posts - 1 through 8 (of 8 total)

Qianchu L

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis