› Forums › Speech Synthesis › Festival › RELP in Festival/Multisyn
- This topic has 4 replies, 2 voices, and was last updated 8 years, 8 months ago by Simon.
-
AuthorPosts
-
-
April 2, 2016 at 23:00 #2926
Did you ever make that RELP video????
My question is about RELP in Festival/Multisyn.
From your SP slides regarding the advantages of RELP:
• Easy modification of pitch and duration
• Can smooth the spectral envelope over the joins
• Optionally, compressed storage of diphones
• Fast to compute
• Festival uses residual excited LPC (RELP) by default
• Near-perfect reconstruction is possible provided F0 and duration are not
modified too far from their original valuesHOWEVER:
we are not doing any pitch/duration modification – correct?
ARE WE smoothing the spectral envelope over the joins?? doesn’t seem like it.SO:
1. if we’re not doing either of those things, WHY are we using RELP?
2. Where, exactly, in the pipeline is the actual joining occurring? Do we crossfade the residuals, then send them through the LP filter to recover a contiguous waveform (with some kind of interpolation to the filter as well), OR, do we go from search, to ‘best list’, to residual units that correspond to best list, through the LPC filter for each unit, to a recreated waveform of each unit, which THEN gets joined with a single frame, time domain crossfade (using TDPSOLA on the waveform itself)??? -
April 3, 2016 at 09:41 #2929
Why does Festival use Residual Excited LP (RELP)?
The released version of Festival uses RELP for two reasons. The first reason is practical – TDPSOLA is patented:
Method and apparatus for speech synthesis by wave form overlapping and adding EP0363233 (A1)
The patent was filed by the French state via their research centre CNET which then became France Telecom, or Orange, as they are known today.
The second reason is that RELP allows pitch/time/spectral envelope modification, as you mention. In the older diphone engine, RELP is indeed used for time- and pitch-modification. In Multisyn, no modification or join smoothing is performed, although in principle it would be possible to add this to the implementation.
-
April 3, 2016 at 10:52 #2931
What is the pipeline for concatenation and RELP waveform generation?
A complete residual signal (which is just a waveform) for the whole utterance is constructed by concatenating the residuals of the selected candidates. Overlap-and-add (i.e., crossfade) is performed at the joins, over a duration of one pitch period. A corresponding sequence of LPC filter coefficients is also constructed.
The function
lpc_filter_fast
in.../speech_tools/sigpr/filter.cc
then does the waveform generation. The inputs are the utterance-length residual waveform and the sequence of LPC filter co-efficients. I’ve just realised that I wrote that code nearly 20 years ago…/*************************************************************************/ /* Author : Simon King */ /* Date : October 1996 */ /*-----------------------------------------------------------------------*/ /* Filter functions */ /* */ /*=======================================================================*/
-
April 3, 2016 at 11:18 #2932
Great! Thank you. Can you fill in the last gap in this part of the process for me:
How are the filter coefficients crossfaded at that one-pitch-period crossfade area?
I’ve looked at the code you posted (its on github), and while it is well-commented, it’s beyond my capability to glean the answer to this question just from looking at it. Although I assume the process is in there somewhere…
-
April 3, 2016 at 12:47 #2934
The filter co-efficients are not cross-faded. Remember that they are specified frame-by-frame (not sample-by-sample, like the residual). We just concatenate the sequences of frames of filter co-efficients for all the candidates – this gives us a complete sequence of filter co-efficients for the full utterance.
-
-
AuthorPosts
- You must be logged in to reply to this topic.