Forum Replies Created
-
AuthorPosts
-
In Taylor’s chapter 16, it’s mentioned that the phone class of different units “should play an important part in any well-designed join cost function” (p.499). Is this at all part of the join cost in Festival?
In the Multisyn paper it states that differences in voicing between two units incur different penalties, but I haven’t found anything stating that the actual phone class has an effect on the cost. If this is the case, why do we distinguish between the closure and burst of a plosive when labelling the database?
Great, that makes a lot more sense, thank you!
So do the different ‘modes’ i.e. different mixture components just allow us to model a greater deal of variation? And if so, will increasing these to a large number potentially result in overfitting the models to the data?
Hello,
I modified the do_alignment script to train the model on a subset of my data. I noticed when inspecting the .lab files that not only does the placement of certain labels change, but some of the labels themselves actually change too. For example, a label might appear as ‘@i’ in the initial system, but change to just ‘@’ in the modified system.
I thought that the labels, provided by the front end, were fixed, and expected that the only change I would see by changing the training data would be in the placement (timing) of the labels.
For example, J&M2 (p.275) state: “In forced alignment mode, a speech recogniser is told exactly what the phone sequence is; its job is just to find the exact phone boundaries in the waveform.”
Is the change in labels just to do with e.g. vowel reductions, or something else that I haven’t considered?
Thank you!
Hello,
I have a similar but more theoretical question regarding the target/join cost weighting. I’ve played around with the settings and calculated the target vs join costs for a couple of different sentences using each of these settings.
I noticed that for every single sentence, the lower the weight of the target cost, the lower both the join and target costs (this doesn’t necessarily make them sound better!).
This doesn’t really make sense to me – surely the target cost should be lowest when its weight is at the higher end of the spectrum (e.g. 1.0, 0.7…), i.e. when higher importance is assigned to it? If there’s something I’m missing, please let me know!
Thanks,
Lucy -
AuthorPosts