Forum Replies Created
-
AuthorPosts
-
Especially for this model (8khz), models trained for longer tend to sound better as the buzz/stochastic noise reduces decreases to nearly nothing, so the silences between words are preserved. The 0.0001 learning rate for this model does come out with a better validation error, and shows a better trend.
This behaviour is particularly prevalent in the 16khz models.
Attachments:
You must be logged in to view attached files.Hi,
I spoke to Simon earlier and he suggested saving a model at a certain number of epochs, maybe around 25-30.
I’ve attached one of the plots.Thanks
Attachments:
You must be logged in to view attached files.Thanks,
Would it be enough to just use the model from the last epoch? Or would it be best to save the model at each epoch, then take the best one from around convergence and use that?
Hi,
With regards to your second point, how problematic is it if you use the that fix (commenting out the if condition) and the best model saved is from the 3rd epoch, but the data does not full converge until ~epoch 20? I’ve had better sounding results from models that are trained for longer/until around convergence.
Thanks
I’ve trained a model successfully with 8khz data, but am now having several issues with 16khz data.
I’ve changed the parameters in the config file accordingly and have managed to atleast start training, but this time I’m getting the above error but on the 2nd epoch, after which it breaks…
2016-06-29 14:26:17,158 DEBUG main.train_DNN: calculating validation loss
2016-06-29 14:26:27,741 INFO main.train_DNN: epoch 1, validation error 628964418174434265008261326597378733722224197484642355226122602583441150074860628769722468455296618923236287798542215123891810860238841699789697149276208814735544520672472477079448467306333648648592746968864784384.000000, train error 106279287171980071212391732921729749563599509286910907520861045585590353898646213523071851776878272693510050177262482329438234528095902286440442106534226060885518322284601006242847044870755156682712675438821376.000000 time spent 180.00
2016-06-29 14:29:08,963 DEBUG main.train_DNN: calculating validation loss
2016-06-29 14:29:17,823 INFO main.train_DNN: epoch 2, validation error nan, train error nan time spent 170.08
2016-06-29 14:29:17,823 INFO main.train_DNN: overall training time: 5.83m validation error 628964418174434265008261326597378733722224197484642355226122602583441150074860628769722468455296618923236287798542215123891810860238841699789697149276208814735544520672472477079448467306333648648592746968864784384.000000Is this because the error is so large? (I’m only running on a training set of 60 and validation and test sets of 15).
I’ve had issues here too – when you request for more disk quota, it might be useful to ask for it to be fixed rather than dynamic quota.
If it’s dynamic your fixed quota remains as it was and only increases by 1gb every hour if you’re near your limit – it checks this every hour. As I was already at my quota of 20gb (and asked for ~50gb) it was running out and failing during dnn training when it was dynamic. If the same happens to you, ask for the maximum fixed quota so it won’t fill up during training.
Thanks guys, things are finally training!
I’ve rerun again, checked all sizes (which are the same), but have come up with the same nan error as above. I’m not sure what to try next?
Ah of course! Then my data is already in float32. (I was checking the types within the script before it saved)
Also, the .dat file does include the correct number of values.
I’m running the steps from MAKECMP again to be sure, but I’m a bit confused as to what to try next.
Thanks.
Yes, I’ve changed output features to only [‘mgc’]
Yes, I’ve changed mgc to match my dimensionality (251) and dmgc to 753
I’ve manually checked my mgcs against the original mgcs, and they have the same number of frames. The same also for the lab and cmp files.
I’ve just checked the type of the numbers in the numpy arrays I’m creating, and they are numpy.float64 – are zeros here treated differently? So should they be float32? (…. I’m not sure this would cause the two zeros to be treated differently though?)
I’ve managed to get to the training stage(!), but get a huge validation error on the first epoch, after which it breaks:
2016-06-12 17:51:57,380 DEBUG main.train_DNN: calculating validation loss
2016-06-12 17:52:00,726 INFO main.train_DNN: epoch 1, validation error nan, train error nan time spent 78.56
2016-06-12 17:52:00,726 INFO main.train_DNN: overall training time: 1.31m validation error 17976931348623157081452742373170
43567980705675258449965989
174768031572607800285387605895586
327668781715404589535143824642343213
268894641827684675467035375169860499
105765512820762454900903893289440758
685084551339423045832369032229481658
085593321233482747978262041447231687
38177180919299881250404026184124858368.000000(The above error is 301 digits long – and won’t fit on this page!)
I’ve inspected the mgc files and the corresponding normalised cmp files, and I’ve noticed that it’s normalising zeros to nan (the first and last samples in each frame are 0). How is this happening? And is there a way to change this?
Further – for some frames, the last 2 digits are 0 (last sample and a binary voicing flag) – but after normalisation the first is nan, and the second is 0 – I’m not sure how it’s treating these differently?
Hi,
Thanks, I’ve remapped my data so my mgcs now contain ~same number of frames as the corresponding mgc’s from the original data, but I’m still getting the same error as above. (when running MAKECMP, NORMCMP, TRAINDNN all set to True)
I can train the network with the original data, but not my new data.
As I technically only have 1 feature stream – my mgc’s – could something be going wrong during the MAKE/NORMCMP process?
-
AuthorPosts