Forum Replies Created
-
AuthorPosts
-
February 28, 2017 at 00:00 in reply to: Can we absorb the process of alignment into the DNN training process? #6813
Hi,
Yes, there are already some attempts for this. For example:
http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0134.PDF
http://ssw9.net/download/ssw9_proceedings.pdf#page=125
Regards,
FelipeYes, that looks much better.
Felipe
OK, do what Simon suggested.
The plot looks strange. In my opinion, you should also try:
– Use the model stored at the epoch 3.
– Decrease the learning rate. Try with 0.1 or 0.5 times the learning rate that you are currently using.Thanks,
Felipe
Hi Andrew,
I would say that you should use the last model. However, maybe your data is not converging. Could you post a picture of the errors, or provide the values of training and validation errors, please?
Thanks,
Felipe
Hi Andrew,
I would use the best model around convergence, but you can compare results and decide. For example, you can modify the line 313 to skip saving the epoch 3:
if epoch>3:
Thanks,
Felipe
Hi Hanzhang,
The vocoder extracts the parameters: spectral envelope, f0 contour, and aperiodicities. Then, you can transform them into MGCs (or MCEP), lf0, and bap, respectively.
Do not confuse MGCs (or MCEP) with MFCCs, they are different features. The forced-alignment process uses MFCCs to recognize the phoneme structures of the data.
Thanks,
Felipe
Hi Max,
I have some ideas/questions:
– What is the value of $m when it fails?
– What is inside the directory /hmm0?
– Check the permissions of the directory.Thanks,
Felipe
Hi Max,
Check in the file “do_alignment_dnn” that your phone_list and train.scp are correct. Also, check that the output of the line 42 (HCompV -C resources/CONFIG_for_training -f 0.01 -m -S train.scp -M hmm0 proto/7states) is correct.
Thanks,
Felipe
Hi Andrew,
It seems that the training is not converging. I think that could be because, either:
1. The data is unpredictable. Check your data.
2. The data is corrupted. Check your data.
3. The learning rate is too large. Decrease the learning_rate to a smaller number.Felipe
Hi Andrew,
Usually, normalization involves the division of every element of the data over some summation of the whole data. In your case, as the first and last samples are always zero, the division gets N/0 for these samples, which produces the nans.
My suggestion: Remove the first and last samples of the frames, since they are producing these problems and they do not contribute with any useful information. So, your new frame size would be 249 (251-2).
Thanks,
Felipe
Hi Andrew,
The function “array_to_binary_file” casts the data to float32 before writing it.
Thanks,
Felipe
Hi Andrew,
I think that you can solve it by mapping your frames positions to the original 5ms frames, so that your data matches the original number of frames. In order words, adapt your data to the original number of frames.
Thanks,
Felipe
Georgia,
If all of that does not work, also you can try by: pip install -user bandmat
Felipe
Hi Georgia,
What is the error thrown if you keep the lines 49, 53 and 54 uncommented?
Thanks,
Felipe
Hi Giorgia,
Uncomment the line 49 of the file mlpg_fast.py:
#import pyximport; pyximport.install()
… and try again.
Thanks,
Felipe
-
AuthorPosts