- This topic has 1 reply, 2 voices, and was last updated 8 years, 5 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › F0 estimation and epoch detection › Accounting for low F0 bias in autocorrelation
Given that cross-correlation is less computationally efficient due to the need to retain more data in memory, is there some way of normalising the autocorrelation to account for the low F0 bias that maintains its computational simplicity, for example by dividing by the number of samples used (W-T)?
Most algorithms would use cross-correlation (also known as modified auto-correlation), even if it does need a little bit more computation. In speech synthesis, F0 estimation is typically a one-time process that happens during voice building and so we don’t care too much about a little extra computational cost, if that gives a better result.
I think when you say “low F0 bias” you mean a bias towards picking peaks at smaller lags. That would be a bias towards picking higher values of F0. For example, we might accidentally pick a peak that corresponds to F1 in some cases.
The YIN pitch tracker (or open access version) performs a transformation (look for “Cumulative mean normalized difference function” in the paper) of the cross-correlation function, to avoid picking F1.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in