F0 estimation

A key parameter in any parametric representation of speech is the fundamental frequency, F0. Estimating it from speech is not trivial: we need an F0 estimation algorithm, often called a "pitch tracker".
  • Pitch tracking vs pitch marking

    Pitch marks are required for signal processing. F0 is used in the join cost or for modelling prosody.

  • Autocorrelation

    Most methods for estimating F0 start with the autocorrelation function, which measures waveform self-similarity.

  • Autocorrelation is not enough

    It provides multiple candidate values for the fundamental period, but how do we reliably select the correct one?

  • Pre- and post-processing

    The solution is to add pre-processing to remove most of the incorrect F0 candidates, then post-processing to select the correct one.

  • Alternatives to autocorrelation

    Autocorrelation (and related functions) are the most popular, but there are alternatives, such as the cepstrum.

  • Evaluation

    To measure the accuracy of an F0 estimator, we need to compare it to some ground truth.