› Forums › Speech Synthesis › Festival › Required files at run time
- This topic has 5 replies, 2 voices, and was last updated 8 years, 8 months ago by Simon.
-
AuthorPosts
-
-
April 2, 2016 at 10:50 #2921
I was checking which files Festival actually needs to load the voice and do synthesis (I’m trying to build an accurate diagram): I started with an empty folder, open festival, try to the load the voice and I added the files as it complained that it needed them.
The only things it asked for were:
– Coef2
– LPC
– Utts
– utts.data
– utts.pausesIit is already weird that it does not need the f0 (how does it know about the “bad f0” target sub cost?), but also, it does not need the pitch marks!! how is that possible?? were not they necessary to concatenate the diphones??
I synthesized a sentence with Festival in this folder and also in a folder with aaall the other files, and the selected units were exactly the same.
-
April 2, 2016 at 13:42 #2922
The “bad f0” penalty is part of the target cost. It obtains F0 at the concatenation points of the unit from the “stripped” join cost features (coef2). In fact, only the voicing status is used, and not the actual value of F0.
-
April 2, 2016 at 21:51 #2925
Ok, thanks, that explains why we don’t need the f0…
but what about the pitch marks?Thank you!
-
April 3, 2016 at 12:45 #2933
You need to do some more detective work to find out how the pitchmarks are found. For example, try omitting the pitchmarking step and see what happens as you build the voice.
-
April 3, 2016 at 17:24 #2944
Ok, so I did, when I created the utts files all of them were flagged as bad_pm.
Then when I try to build the LPCs it complains that there are no pitch marks.I checked “make_lpc” and at the end where it says: “Extract the LPC coefficients” and uses “sig2fv” asks as argument the pitch marks (but not to build the residuals with “sigfilter”).
So, I checked “sig2fv_main.cc” and it says at the introductory comments:
“-pm <ifile> Pitch mark file name. This is used to \n”
” specify the positions of the analysis frames for pitch \n”
” synchronous analysis. Pitchmark files are just standard \n”
” track files, but the channel information is ignored and \n”
” only the time positions are used\n”Then later, the only place I can see the pitch marks are being used is:
// allocate and fill time axis
if (al.present(“-pm”))
{
if (read_track(full, al.val(“-pm”), al))
exit(1);
}And given a comment at the end with some examples, I see that we are actually doing: “Pitch Synchronous linear prediction”.
Sooo… I don’t really understand the detail, but:
1. Apparently the Linear prediction analysis is being done at a time step given by the pitch marks, using them as the centre of the analysis windows?
2. so we don’t actually need the pitch marks at run time because the LPC coefficients are made already based on that timing?
3. but then why don’t we have that information for the residuals too if, as you mentioned in other post, the residuals are concatenated using the pitch periods given the pitch marks?
I hope you could explain this whole process in detail, I’m a little confused about this whole issue with pitch marks, LPC, residuals, concatenation, etc., it would be very helpful… thank you!
-
April 3, 2016 at 19:16 #2947
You’ve got all the essential points. The coefficients needed for RELP synthesis are stored in two parallel sets of files: the LPC filter coefficients and the residual signals. The filter coefficients are a sequence of vectors (one vector = one set of filter coefficients at a certain point in time) and these are pitch synchronous, and so implicitly represent the pitch marks (your point 2. is correct). The answer to point 3 is that the information is already there in the filter co-efficients, and there is no need to duplicate that in the residuals. Filter co-efficients and residuals “belong together” and for each utterance there is a pair of files.
-
-
AuthorPosts
- You must be logged in to reply to this topic.