- This topic has 1 reply, 2 voices, and was last updated 8 years, 5 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › F0 estimation and epoch detection › Reconstructing F0 from harmonics
In cases where F0 itself is absent, people can perceive the correct pitch, retrieving F0 arithmetically from higher harmonics. Is a similar approach not algorithmically viable, to deal with ‘octave errors’ where the wrong lag might return a higher auto-correlation than the correct pitch period? If multiple F0 candidates in an arithmetical series (corresponding to harmonics) are identified, the correct F0 is the highest frequency which is a factor of all higher candidates.
e.g. Avoiding the ‘octave’ error (a lag of 2 pitch periods): if 50Hz, 100Hz, 200Hz are suggested as candidates, and there are also candidates at 300Hz and 400Hz, but not 350Hz, then 100Hz should be the correct F0. (50Hz and 100Hz are both factors of all higher candidates, but 100Hz is higher; 200Hz is not a factor of 300Hz.)
An excellent idea, and one that has indeed been proposed in the literature, specifically for the case where the fundamental is absent (e.g., speech transmitted down old-fashioned telephone lines).
What you propose is to find the largest common divisor of a set of candidate values for F0. See http://dx.doi.org/10.1121/1.1910902 (the full text is behind a paywall: you’ll need to enter the JASA website via the University library to gain access).
This could be combined with any way of finding candidate values for F0 (e.g., autocorrelation) and we would also expect that some post processing (e.g., dynamic programming) would further improve results.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in