Input and Output frames

This topic has 14 replies, 3 voices, and was last updated 8 years, 8 months ago by Felipe E.

Viewing 14 reply threads

Author

Posts
- June 8, 2016 at 15:20 #3247
  Andrew W
  Student
  Am I right in understanding that the input and output files for each utterance should have the same number of frames?
  I see that during input label normalisation the labels are converted to frame-level features, is this based on the 5msec frameshift used for the mfcc creation, and is this frameshift also used to prepare the output features?
  
  I’m currently using the Nick data but am getting the following error when I replace the .mgc files with my own .mgc files based on a varying frame rate(determined by the position of glottal pulses):
  
  2016-06-08 14:54:20,808 DEBUG main.train_DNN: Creating validation data provider
  2016-06-08 14:54:20,816 CRITICALListDataProvider: the number of frames in label and acoustic features are different: 481 vs 257
  481
  257
  2016-06-08 14:54:20,816 CRITICAL main : train_DNN threw an exception
  Traceback (most recent call last):
  File “/Users/s1112290/Documents/dnn_tts/run_lstm.py”, line 1009, in <module>
  main_function(cfg)
  File “/Users/s1112290/Documents/dnn_tts/run_lstm.py”, line 779, in main_function
  cmp_mean_vector = cmp_mean_vector, cmp_std_vector = cmp_std_vector)
  File “/Users/s1112290/Documents/dnn_tts/run_lstm.py”, line 200, in train_DNN
  shared_train_set_xy, temp_train_set_x, temp_train_set_y = train_data_reader.load_one_partition()
  File “/mnt/courses.homes/s1112290/Documents/dnn_tts/utils/providers.py”, line 155, in load_one_partition
  shared_set_xy, temp_set_x, temp_set_y = self.load_next_partition()
  File “/mnt/courses.homes/s1112290/Documents/dnn_tts/utils/providers.py”, line 236, in load_next_partition
  raise
  TypeError: exceptions must be old-style classes or derived from BaseException, not NoneType
  
  Is this the cause of the error? And then its knock-on effect of not passing the next function any files.
  If so, am I right in thinking I’ll also have to prepare the input data based on these changing frame sizes?
  
  Thanks
- June 8, 2016 at 15:45 #3248
  Srikanth R
  Student
  Yes, the number of frames for input and output should be same. A minor difference (< 5) is allowed.
  
  In your case, I suggest using LSTMs with input as phone-level features (excluding duration features) with number of frames matching the number in output.
- June 8, 2016 at 15:53 #3249
  Felipe E
  Student
  Hi Andrew,
  
  I think that you can solve it by mapping your frames positions to the original 5ms frames, so that your data matches the original number of frames. In order words, adapt your data to the original number of frames.
  
  Thanks,
  
  Felipe
- June 10, 2016 at 16:19 #3254
  Andrew W
  Student
  Hi,
  
  Thanks, I’ve remapped my data so my mgcs now contain ~same number of frames as the corresponding mgc’s from the original data, but I’m still getting the same error as above. (when running MAKECMP, NORMCMP, TRAINDNN all set to True)
  
  I can train the network with the original data, but not my new data.
  
  As I technically only have 1 feature stream – my mgc’s – could something be going wrong during the MAKE/NORMCMP process?
- June 13, 2016 at 11:50 #3259
  Andrew W
  Student
  I’ve managed to get to the training stage(!), but get a huge validation error on the first epoch, after which it breaks:
  
  2016-06-12 17:51:57,380 DEBUG main.train_DNN: calculating validation loss
  2016-06-12 17:52:00,726 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 78.56
  2016-06-12 17:52:00,726 INFO main.train_DNN: overall training time: 1.31m validation error 17976931348623157081452742373170
  43567980705675258449965989
  174768031572607800285387605895586
  327668781715404589535143824642343213
  268894641827684675467035375169860499
  105765512820762454900903893289440758
  685084551339423045832369032229481658
  085593321233482747978262041447231687
  38177180919299881250404026184124858368.000000
  
  (The above error is 301 digits long – and won’t fit on this page!)
  
  I’ve inspected the mgc files and the corresponding normalised cmp files, and I’ve noticed that it’s normalising zeros to nan (the first and last samples in each frame are 0). How is this happening? And is there a way to change this?
  
  Further – for some frames, the last 2 digits are 0 (last sample and a binary voicing flag) – but after normalisation the first is nan, and the second is 0 – I’m not sure how it’s treating these differently?
- June 13, 2016 at 12:05 #3260
  Srikanth R
  Student
  Did you modify the output_feature streams in configuration file?
  output_features : [‘mgc’, ‘lf0’, ‘vuv’, ‘bap’] to
  output_features : [‘mgc’]
  
  and change the dimension?
  [Outputs]
  mgc : 60 -> did you change this number?
  dmgc : 180 -> did you change this number?
  
  and did you check the number of frames in modified files same as in original data? If yes, did you use the same command as below?
  x2x +fa mgc/herald_001.mgc | wc -l
  
  If the answer to all the above questions is Yes, please do check one of the files in the final directories before training:
  x2x +fa nn_no_silence_lab_norm_601/herald_001.lab | wc -l (divide this number by 601)
  x2x +fa nn_norm_mgc_{dim}/herald_001.cmp | wc -l (divide this number by corresponding dim)
  
  If both remains still same, please let us know.
- June 13, 2016 at 12:35 #3261
  Andrew W
  Student
  Yes, I’ve changed output features to only [‘mgc’]
  
  Yes, I’ve changed mgc to match my dimensionality (251) and dmgc to 753
  
  I’ve manually checked my mgcs against the original mgcs, and they have the same number of frames. The same also for the lab and cmp files.
  
  I’ve just checked the type of the numbers in the numpy arrays I’m creating, and they are numpy.float64 – are zeros here treated differently? So should they be float32? (…. I’m not sure this would cause the two zeros to be treated differently though?)
- June 13, 2016 at 14:31 #3262
  Srikanth R
  Student
  yeah, everything should be in numpy.float32.
  
  I can’t certainly say that this caused the error but still to be corrected.
  
  Also, check norm_info_mgc_251_MVN.dat file. It should contain 512 values with first 251 values representing mean of the data and next 251 values representing variance of the data.
- June 13, 2016 at 14:41 #3263
  Felipe E
  Student
  Hi Andrew,
  
  The function “array_to_binary_file” casts the data to float32 before writing it.
  
  Thanks,
  
  Felipe
- June 13, 2016 at 15:04 #3264
  Andrew W
  Student
  Ah of course! Then my data is already in float32. (I was checking the types within the script before it saved)
  
  Also, the .dat file does include the correct number of values.
  
  I’m running the steps from MAKECMP again to be sure, but I’m a bit confused as to what to try next.
  
  Thanks.
- June 13, 2016 at 16:43 #3265
  Andrew W
  Student
  I’ve rerun again, checked all sizes (which are the same), but have come up with the same nan error as above. I’m not sure what to try next?
- June 13, 2016 at 17:27 #3266
  Felipe E
  Student
  Hi Andrew,
  
  Usually, normalization involves the division of every element of the data over some summation of the whole data. In your case, as the first and last samples are always zero, the division gets N/0 for these samples, which produces the nans.
  
  My suggestion: Remove the first and last samples of the frames, since they are producing these problems and they do not contribute with any useful information. So, your new frame size would be 249 (251-2).
  
  Thanks,
  
  Felipe
- June 13, 2016 at 19:14 #3267
  Andrew W
  Student
  Thanks guys, things are finally training!
- June 29, 2016 at 14:42 #3311
  Andrew W
  Student
  I’ve trained a model successfully with 8khz data, but am now having several issues with 16khz data.
  
  I’ve changed the parameters in the config file accordingly and have managed to atleast start training, but this time I’m getting the above error but on the 2nd epoch, after which it breaks…
  
  2016-06-29 14:26:17,158 DEBUG main.train_DNN: calculating validation loss
  2016-06-29 14:26:27,741 INFO main.train_DNN: epoch 1, validation error 628964418174434265008261326597378733722224197484642355226122602583441150074860628769722468455296618923236287798542215123891810860238841699789697149276208814735544520672472477079448467306333648648592746968864784384.000000, train error 106279287171980071212391732921729749563599509286910907520861045585590353898646213523071851776878272693510050177262482329438234528095902286440442106534226060885518322284601006242847044870755156682712675438821376.000000 time spent 180.00
  2016-06-29 14:29:08,963 DEBUG main.train_DNN: calculating validation loss
  2016-06-29 14:29:17,823 INFO main.train_DNN: epoch 2, validation error nan, train error nan time spent 170.08
  2016-06-29 14:29:17,823 INFO main.train_DNN: overall training time: 5.83m validation error 628964418174434265008261326597378733722224197484642355226122602583441150074860628769722468455296618923236287798542215123891810860238841699789697149276208814735544520672472477079448467306333648648592746968864784384.000000
  
  Is this because the error is so large? (I’m only running on a training set of 60 and validation and test sets of 15).
- June 29, 2016 at 14:59 #3312
  Felipe E
  Student
  Hi Andrew,
  
  It seems that the training is not converging. I think that could be because, either:
  
  1. The data is unpredictable. Check your data.
  2. The data is corrupted. Check your data.
  3. The learning rate is too large. Decrease the learning_rate to a smaller number.
  
  Felipe
Author

Posts

Viewing 14 reply threads

You must be logged in to reply to this topic.

Input and Output frames

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis