A key parameter in any parametric representation of speech is the fundamental frequency, F0. Estimating it from speech is not trivial: we need an F0 estimation algorithm, often called a "pitch tracker".
Pitch tracking vs pitch marking
Pitch marks are required for signal processing. F0 is used in the join cost or for modelling prosody.
Autocorrelation
Most methods for estimating F0 start with the autocorrelation function, which measures waveform self-similarity.
Autocorrelation is not enough
It provides multiple candidate values for the fundamental period, but how do we reliably select the correct one?
Pre- and post-processing
The solution is to add pre-processing to remove most of the incorrect F0 candidates, then post-processing to select the correct one.
Alternatives to autocorrelation
Autocorrelation (and related functions) are the most popular, but there are alternatives, such as the cepstrum.