- This topic has 1 reply, 2 voices, and was last updated 4 years ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › Gaussian probability density functions (pdfs) › Robustness Definition
I am a bit confused over the definition of robustness for an asr system. Does this mean accuracy is high across multiple different users?
I don’t understand how this would be possible even with training data from multiple users. Surely the acoustic model in that case would produce a mean that was an amalgamation of the acoustic features of all the different users? Then accuracy would be low because no-one would come close to a mean value of all different users.
“Robust” means that WER is generally low across a variety of conditions. A standard, effective method for training a robust ASR system is simply to train on diverse data.
You need to answer the second part of your question by conducting experiments of your own design, which presumably will include training at least one model on diverse data and another model on less-diverse data, and a number of tests sets of your choosing.
Your point about the mean is reasonable, and perhaps suggests that a model trained on diverse data will give a higher WER than one trained on less-diverse data provided the less-diverse-data model matches the test set. But don’t forget the variances…
On average, perhaps the diverse-data-trained model will give a lower WER across a range of test sets, whereas the non-diverse-data-trained model gives a high WER on all mismatched test sets and only a low WER on the matched test set? It’s up to you to find out!
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in