› Forums › Automatic speech recognition › Features › Bark Scale vd Mel Scale
- This topic has 3 replies, 3 voices, and was last updated 5 years, 9 months ago by Simon.
-
AuthorPosts
-
-
October 28, 2018 at 19:50 #9504
I have been curious a long time ago about this.
Why is preferred Mel scale over Bark scale, are some key difference among them?? or the use of one of them depends on the application? -
October 29, 2018 at 13:28 #9510
Hi Fabian, I don’t know if you happen to take Phonetics and Lab Phonology as well, a separate course to Speech Processing. Last week in class there was a very similar question about the differences between different perceptual scales and why some are preferred over others in certain situations.
Our lecturer Bert provided us a very informative paper which is basically an experimental evaluation of the different perceptual scales that illustrates the differences between them and the different experimental purposes they might be used for.
I found that really helpful and I thought it might help you too, so I have uploaded it here.
Attachments:
You must be logged in to view attached files.-
October 29, 2018 at 22:38 #9514
Hi Danielle, I appreciate a lot your answer. I will read it on the weekend 🙂 Probably if I have some doubts or comments for debating I will write here again. I found it an interesting topic to discuss.
Thanks again!
-
-
October 30, 2018 at 21:50 #9523
Good answer Danielle. But we should note that this paper is specifically about frequency scales for representing pitch (the perceptual correlate of fundamental frequency) rather than the more general spectral envelope information (e.g., formant frequencies) that is important for speech recognition.
Regarding the choice of frequency scale for Automatic Speech Recognition (ASR), the key property we want is a non-linear scale that compresses the higher frequencies more than the lower ones. In other words, the resulting features (e.g., filterbank energies) use more co-efficients to describe the most important (i.e., most informative) frequency range for speech up to around 3 kHz, and fewer co-efficients for the higher frequencies that are less important (i.e, contain less information).
All perceptual scales (Mel, Bark, etc) have this property. They will all work much the same for this application and the choice is made either through personal preference, or empirically by experimentation. The Mel scale is by far the most popular for ASR.
-
-
AuthorPosts
- You must be logged in to reply to this topic.