Forum Replies Created
-
AuthorPosts
-
Thank you!
I attach the script, you need to change the extension to .py and you have to run it like this (tested with python 2):
python txtgrd2lab.py ./txtgrids/ ./labels/
(so the first argument is a folder where you have the textgrids and the second one is the folder where the label files will be saved).
I was able to open it correctly with wavesurfer in my laptop, but please test it with wavesurfer in the lab to see if the labels are correct.
If you find any other errors or problems let me know. By the way, one of your intervals had a missing label.
If the script properly works, it only works with interval type textGrids, so if anybody else wants to label with Praat check your classmate textGrid and do it in the same way, and then you can use this one to convert to wavesurfer labels.
This script has not been properly tested so you might run into errors or problems testing with other data.
Attachments:
You must be logged in to view attached files.Hello! I wrote a bit of code to convert the format. Would you send me the wav file to see if it works? If you don’t want to/can’t post it here you can send it to my e-mail.
Ok, sorted out. Apparently there was something about my paths that SPTK did not like it so it was not working…
Gracias Norbert, I tried but it does not work. I’ve tried with different versions of SPTK, local and in the network, but I keep getting the same error about the “weight” file.
2016-07-03 11:42:06,644 CRITICAL subprocess: for command: echo 1 1 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 | /mnt/courses/ss/dnn/tools/SPTK-3.9/bin/x2x +af > /mnt/courses.homes/s1520337/Desktop/Dissertation_experiments/DNN Acoustic models/DNN try WORLD/gen/DNN_TANH_TANH_TANH_TANH_TANH_TANH_LINEAR__mgc_lf0_vuv_bap_1_3300_373_199_6_1024_1024/weight
2016-07-03 11:42:06,644 CRITICAL subprocess: stderr: Cannot open file Acoustic!Hi I also have a problem with GENWAV, I pointing the right folder for SPTK, it generated the acoustic parameters in the previous step but now that I want to generate the waveform it can’t because it does not have a “weight” file. This is the error:
2016-07-02 12:06:53,072 CRITICAL subprocess: OSError for echo 1 1 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 | /mnt/courses/ss/dnn/tools/SPTK-3.9/bin/x2x +af > /mnt/courses.homes/s1520337/Desktop/Dissertation_experiments/DNN Acoustic models/DNN try WORLD/gen/DNN_TANH_TANH_TANH_TANH_TANH_TANH_LINEAR__mgc_lf0_vuv_bap_1_3300_373_199_6_1024_1024/weight
Traceback (most recent call last):
File “/mnt/courses.homes/s1520337/Documents/dnn_tts/run_lstm.py”, line 1089, in <module>
main_function(cfg)
File “/mnt/courses.homes/s1520337/Documents/dnn_tts/run_lstm.py”, line 928, in main_function
generate_wav(gen_dir, gen_file_id_list, cfg) # generated speech
File “/mnt/courses.homes/s1520337/Documents/dnn_tts/utils/generate.py”, line 171, in generate_wav
.format(line=line, x2x=SPTK[‘X2X’], weight=os.path.join(gen_dir, ‘weight’)))
File “/mnt/courses.homes/s1520337/Documents/dnn_tts/utils/generate.py”, line 90, in run_process
raise OSErrorSo the questions are:
– Where is this file generated? (and why I don’t have it)
– What information has this weight file?
– How do I fix the error?Thank you!
Ok, so I did, when I created the utts files all of them were flagged as bad_pm.
Then when I try to build the LPCs it complains that there are no pitch marks.I checked “make_lpc” and at the end where it says: “Extract the LPC coefficients” and uses “sig2fv” asks as argument the pitch marks (but not to build the residuals with “sigfilter”).
So, I checked “sig2fv_main.cc” and it says at the introductory comments:
“-pm <ifile> Pitch mark file name. This is used to \n”
” specify the positions of the analysis frames for pitch \n”
” synchronous analysis. Pitchmark files are just standard \n”
” track files, but the channel information is ignored and \n”
” only the time positions are used\n”Then later, the only place I can see the pitch marks are being used is:
// allocate and fill time axis
if (al.present(“-pm”))
{
if (read_track(full, al.val(“-pm”), al))
exit(1);
}And given a comment at the end with some examples, I see that we are actually doing: “Pitch Synchronous linear prediction”.
Sooo… I don’t really understand the detail, but:
1. Apparently the Linear prediction analysis is being done at a time step given by the pitch marks, using them as the centre of the analysis windows?
2. so we don’t actually need the pitch marks at run time because the LPC coefficients are made already based on that timing?
3. but then why don’t we have that information for the residuals too if, as you mentioned in other post, the residuals are concatenated using the pitch periods given the pitch marks?
I hope you could explain this whole process in detail, I’m a little confused about this whole issue with pitch marks, LPC, residuals, concatenation, etc., it would be very helpful… thank you!
Ok, thanks, that explains why we don’t need the f0…
but what about the pitch marks?Thank you!
Adding to this topic on basics on NNs…
I don’t understand how people choose the number of hidden layers, the number of units per layer, and the functions to put in them. Is it just a matter of trial and error? For example, in the Zen’s reading a foot note says:
“5 Although the linear activation function is popular in DNN-based regression, our preliminary experiments showed that the DNN with the sigmoid activation function at the output layer consistently outperformed those with the linear one.”
Is there any intuition to choose your functions based on how you think the net should transform your input to get to the desire output? (specifically here for speech synthesis) Or do you try different combinations and after we get the best result we try to understand why that architecture was better?
What if we want to time a process inside Festival, to time how long it takes to generate the utterances?
Is it this function in “strip join cost coef” that’s calculating the middle point?:
def join_point_time(item):
if item.f_present( “cl_end” ):
return item.F( “cl_end” )
elif item.f_present( “dipth” ):
return (0.75*item.F( “start” )) + (0.25*item.F(“end”))
else :
return (item.F( “start” ) + item.F(“end”))/2Apparently does something different for stops and diphthongs, otherwise it just takes the start and the end, sums and divides by 2 to get the half point.
-
AuthorPosts