› Forums › Speech Synthesis › Unit selection › GMMs in Forced Alignment
- This topic has 3 replies, 2 voices, and was last updated 4 years, 9 months ago by Simon.
-
AuthorPosts
-
-
April 16, 2020 at 14:42 #11174
Hello,
When we talk about the number of mixture components in the do_alignment script, what does this actually refer to?
I initially assumed it was the number of Gaussian distributions in the multi-dimensional Gaussian in each state, but as we’re modelling MFCCs I assume we would need 39-dimensional Gaussians to do so, and not just 8, as is standard in the do_alignment script…
Thank you!
-
April 16, 2020 at 15:19 #11175
The number of “mixture components ” the number of individual Gaussians that are summed together in the Gaussian Mixture Model in each HMM state. Each such Gaussian is multidimensional (as you say, with 39 dimensions).
If you’re finding it hard to separate out the two concepts “number of mixture components” and “multivariate“, then aim to understand things in this order:
- An individual univariate Gaussian: it emits observations that are vectors with 1 dimension (or just scalars, if you prefer)
- Extend that to a mixture of univariate Gaussian components: observations are still vectors with 1 dimension, but drawn from a more complex probability density function (with multiple modes)
- Go back to the individual univariate Gaussian in 1 dimension
- Now extend that single univariate Gaussian to a single multivariate Gaussian: it emits observations that are vectors with, say, 39 dimensions but drawn from a distribution with only one mode
- Put 2 and 4 together to get a model that emits observations that are vectors with 39 dimensions (that’s the multivariate part), and are drawn from a more complex probability density function (because it’s now a mixture distribution)
-
April 16, 2020 at 16:03 #11177
Great, that makes a lot more sense, thank you!
So do the different ‘modes’ i.e. different mixture components just allow us to model a greater deal of variation? And if so, will increasing these to a large number potentially result in overfitting the models to the data?
-
April 16, 2020 at 16:21 #11178
Yes, more components gives a more expressive probability density function that can better fit the data. As in all machine learning, having too many model parameters (here, the number of mixture components controls the number of means and variances to estimate) can lead to overfitting. That’s probably not the main concern here, since we are not trying to generalise from training data to test data.
Try limiting the number of components to 1, as a way to get potentially worse alignments. Another way to get worse alignments would be to reduce the amount of data used to train the models, whilst still aligning all the data in the final run of
HVite
in the script.
-
-
AuthorPosts
- You must be logged in to reply to this topic.