› Forums › Speech Synthesis › F0 estimation and epoch detection › Festival's pitch marking vs pitch tracking
- This topic has 7 replies, 4 voices, and was last updated 6 years, 4 months ago by Simon.
-
AuthorPosts
-
-
April 7, 2017 at 09:11 #7056
Even though I quite get the difference between pitch marking and pitch tracking I don’t understand how festival goes around it. Particularly what confuses me is what the festival does during make_pm_wave and make_f0. I tried to find something relevant in the documentation but I couldn’t. However, here http://www.outsideecho.com/llsti/pubs/Pitch_marking.pdf is mentioned that during make_pm_wave, Festival extracts pitch marks using autocorrelation. If this is true, then does Festival use the same method (autocorrelation) to track F0 during make_f0?
-
April 7, 2017 at 10:11 #7057
http://www.outsideecho.com/llsti/pubs/Pitch_marking.pdf is wrong, in this regard.
make_pm_wave
is a script that runs the programpitchmark
The documentation is here. Note that this method does actually work on speech waveforms, although at the time the manual was written we were still using Laryngograph signals, which later proved to be unnecessary.
pitchmark
works as described in the video -
April 7, 2017 at 10:38 #7059
Which variant of the autocorrelation function does Festival use (there are many listed in the Talkin paper…)?
-
April 7, 2017 at 10:51 #7060
When building voices for Festival, we could use any pitch tracker we liked. In the practical exercise, a tool from the Edinburgh Speech Tools library called
pda
(pitch determination algorithm) is used, which implements the “super resolution pitch determination algorithm“.We could just as well have used Talkin’s “RAPT” method, which is available in a program called
get_f0
.The small differences between methods ares not important for your understanding of the general principles behind pitch tracking.
I recommend only trying to understand RAPT, and section 3.1 of the paper will tell you exactly with version of the autocorrelation function is used in that method.
-
April 7, 2017 at 10:53 #7061
What pre-processing and post-processing is used in
pda
. We must specify male/female pitch range: would that count as post-processing? -
April 7, 2017 at 11:02 #7062
The male/female flag provided to the
make_f0
script just sets some reasonable values for a set of parameters passed to thepda
program. Read themake_f0
script to see what these are, and then read the manual for pda to understand which ones refer to pre- or post-processing.-L means low-pass filtering, which is pre-processing
-d 1 means decimation (downsampling), but the value 1 means that no decimation is actually used – this is pre-processing
-P means peak tracking, in other words dynamic programming – this is post-processing
-
April 2, 2018 at 19:01 #9182
In the make_pm_wave script, the default parameters for male and female voices are set like this:
# default male settings 50-180Hz
DEFAULT_PM_ARGS_MALE=’-min 0.0055 -max 0.02 -def 0.01 -wave_end -lx_lf 200 -lx_lo 111 -lx_hf 30 -lx_ho 51 -med_o 0′# default female settings 100-250Hz
DEFAULT_PM_ARGS_FEMALE=’-min 0.004 -max 0.01 -def 0.01 -wave_end -lx_lf 300 -lx_lo 111 -lx_hf 140 -lx_ho 51 -med_o 0′I don’t understand why the lx_hf parameter is set to 140Hz for female voices when we are looking to identify a F0 value between 100 and 250Hz. If we apply this 140-300Hz filter, how can we find F0 values between 100 and 140Hz? This confuses me more since the lx_hf parameter for males is set to 30Hz, i.e. 20Hz below the threshold of 50Hz.
-
April 3, 2018 at 11:16 #9183
That looks like an error, although it will only have an effect for relatively low-pitched female voices.
-
-
AuthorPosts
- You must be logged in to reply to this topic.