› Forums › Speech Synthesis › F0 estimation and epoch detection › Postprocessing in Epoch Detection
- This topic has 1 reply, 2 voices, and was last updated 3 years ago by Korin Richmond.
-
AuthorPosts
-
-
April 2, 2021 at 17:35 #13927
Hi,
why is postprocessing necessary in epoch detection? There are these 2 things that I don’t understand:
1) I thought the main goal of epoch detection is to find a consistent point within each pitch period. Why is it then relevant that this point is on the largest peak?
2) After counting the zero-crossings in the derivative of the waveform: why are the zero-crossings not on the largest peaks/ why do we need to shift our marks? I thought we remove everything besides F0, so each maximum in the waveform (i.e. largest peaks) will correspond to a zero-crossing in the derivative. How can it be that a zero-crossing does not correspond to the largest peak in the waveform?
-
April 7, 2021 at 16:59 #13932
> 1) I thought the main goal of epoch detection is to find a consistent point within each pitch period. Why is it then relevant that this point is on the largest peak?
I can think of two reasons: i) notionally, we are seeking to find the very instant of glottal closure (GCI), which is the point of maximum excitation; ii) we would typically want to centre any window (e.g. Hamming or Hann window, when doing PSOLA or other pitch-synchronous processing) at the GCI because maximum energy is there (and also the closed phase just following it).
> 2) After counting the zero-crossings in the derivative of the waveform: why are the zero-crossings not on the largest peaks/ why do we need to shift our marks? I thought we remove everything besides F0, so each maximum in the waveform (i.e. largest peaks) will correspond to a zero-crossing in the derivative. How can it be that a zero-crossing does not correspond to the largest peak in the waveform?
Because signals can be “messy” and the algorithm imperfect 🙂
-
-
AuthorPosts
- You must be logged in to reply to this topic.