- This topic has 1 reply, 2 voices, and was last updated 4 years ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Automatic speech recognition › HTK › HResults: Dynamic Programming
I don’t quite understand this complicated methodology that HResults is using to estimate accuracy/WER. I thought it was just plain accuracy i.e. is label same or not divided by total labels to be predicted.
Could you please explain what this attachment is saying a little?
HResults
uses dynamic programming to align the recognition output and the reference transcription. So, WER is simply the edit distance between recognition output and reference transcription.
There are three possible types of error: substitutions, insertions and deletions. WER is just the sum of those three, divided by the number of words in the reference transcription, and expressed as a percentage.
For the special case of isolated words, the only possible type of error is a substitution error, and the dynamic programming is not really needed.
Note that HResults
reports “Accuracy (Acc)”, but you should only use WER (100 – Acc) in your report.
Ignore the value of “Correct (Corr)” reported by HResults
– this does not account for insertion errors and is not a measure used anymore.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in