Festival's pitch marking vs pitch tracking

This topic has 7 replies, 4 voices, and was last updated 6 years, 11 months ago by Simon.

Viewing 7 reply threads

Author

Posts
- April 7, 2017 at 09:11 #7056
  Maria
  Student
  Even though I quite get the difference between pitch marking and pitch tracking I don’t understand how festival goes around it. Particularly what confuses me is what the festival does during make_pm_wave and make_f0. I tried to find something relevant in the documentation but I couldn’t. However, here http://www.outsideecho.com/llsti/pubs/Pitch_marking.pdf is mentioned that during make_pm_wave, Festival extracts pitch marks using autocorrelation. If this is true, then does Festival use the same method (autocorrelation) to track F0 during make_f0?
- April 7, 2017 at 10:11 #7057
  Simon
  Professor
  http://www.outsideecho.com/llsti/pubs/Pitch_marking.pdf is wrong, in this regard.
  
  make_pm_wave is a script that runs the program pitchmark
  
  The documentation is here. Note that this method does actually work on speech waveforms, although at the time the manual was written we were still using Laryngograph signals, which later proved to be unnecessary.
  
  pitchmark works as described in the video
- April 7, 2017 at 10:38 #7059
  Anonymous Student
  Student
  Which variant of the autocorrelation function does Festival use (there are many listed in the Talkin paper…)?
- April 7, 2017 at 10:51 #7060
  Simon
  Professor
  When building voices for Festival, we could use any pitch tracker we liked. In the practical exercise, a tool from the Edinburgh Speech Tools library called pda (pitch determination algorithm) is used, which implements the “super resolution pitch determination algorithm“.
  
  We could just as well have used Talkin’s “RAPT” method, which is available in a program called get_f0.
  
  The small differences between methods ares not important for your understanding of the general principles behind pitch tracking.
  
  I recommend only trying to understand RAPT, and section 3.1 of the paper will tell you exactly with version of the autocorrelation function is used in that method.
- April 7, 2017 at 10:53 #7061
  Anonymous Student
  Student
  What pre-processing and post-processing is used in pda. We must specify male/female pitch range: would that count as post-processing?
- April 7, 2017 at 11:02 #7062
  Simon
  Professor
  The male/female flag provided to the make_f0 script just sets some reasonable values for a set of parameters passed to the pda program. Read the make_f0 script to see what these are, and then read the manual for pda to understand which ones refer to pre- or post-processing.
  
  -L means low-pass filtering, which is pre-processing
  
  -d 1 means decimation (downsampling), but the value 1 means that no decimation is actually used – this is pre-processing
  
  -P means peak tracking, in other words dynamic programming – this is post-processing
- April 2, 2018 at 19:01 #9182
  Adele A
  Student
  In the make_pm_wave script, the default parameters for male and female voices are set like this:
  
  # default male settings 50-180Hz
  DEFAULT_PM_ARGS_MALE=’-min 0.0055 -max 0.02 -def 0.01 -wave_end -lx_lf 200 -lx_lo 111 -lx_hf 30 -lx_ho 51 -med_o 0′
  
  # default female settings 100-250Hz
  DEFAULT_PM_ARGS_FEMALE=’-min 0.004 -max 0.01 -def 0.01 -wave_end -lx_lf 300 -lx_lo 111 -lx_hf 140 -lx_ho 51 -med_o 0′
  
  I don’t understand why the lx_hf parameter is set to 140Hz for female voices when we are looking to identify a F0 value between 100 and 250Hz. If we apply this 140-300Hz filter, how can we find F0 values between 100 and 140Hz? This confuses me more since the lx_hf parameter for males is set to 30Hz, i.e. 20Hz below the threshold of 50Hz.
- April 3, 2018 at 11:16 #9183
  Simon
  Professor
  That looks like an error, although it will only have an effect for relatively low-pitched female voices.
Author

Posts

Viewing 7 reply threads

You must be logged in to reply to this topic.

Festival's pitch marking vs pitch tracking

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis