› Forums › Automatic speech recognition › Features › Features in Chapter 8-Holmes&Holmes
- This topic has 1 reply, 2 voices, and was last updated 3 years, 9 months ago by Simon.
-
AuthorPosts
-
-
November 12, 2020 at 10:47 #13038
Hello there,
I am reading about feature vectors and frames in Holmes&Holmes (Ch.8). At the end of page 110, they say:
“Those features of the acoustic signal that are determined by the phonetic properties should obviously be given more weight in the distance calculation.”
I don’t fully know how to interpret ‘weight’ here.
What I mainly understand is that they want to extract important phonetic properties rather than other less relevant features (e.g. silence, noise, etc). Is that correct?
Also, do we already know how to differentiate the two in the distance calculation, e.g. in dynamic time warping?
Thank you!
-
November 15, 2020 at 09:45 #13066
They are pointing to a problem with simple distance metrics such as the Euclidean distance. This metric assumes all dimensions of the feature vector are equally important and simply sums up the squared differences between corresponding elements in the two vectors being compared.
This is sensitive to the scale of each element.
Take the example of filterbank energies as our feature vector, and that – in general and on average across all the data – the amount of energy in the 2nd filter is around 10 times larger than that in the 11th filter. (Look at a typical speech magnitude spectrum to see why this could be the case.)
The 2nd element of the feature vector will contribute about 10 times as much to the total distance being calculated as the 11th element. It is being treated as more important.
One solution to this would be to weight the elements as we sum them up in the Euclidean distance, to balance their contributions according to how important we think they are.
This is precisely what the Gaussian distribution does for us: it is what the standard deviation parameter is for. This scales each dimension of the distance calculation according to the amount of variability we see along that dimension for the class we are modelling.
Out of scope for this course, but something you will see in the literature, is a scaled Euclidean distance called the Mahalanobis distance. That is the same form that appears in the exponent of the Gaussian equation.
-
-
AuthorPosts
- You must be logged in to reply to this topic.