Forum Replies Created
-
AuthorPosts
-
The dimension of your acoustic feature is not 5. Check which features, you are using and it’s dimension.
1) Yes, if you use run_dnn.py — then it automatically uses 0.5 times the learning rate for top hidden layers (>4). It’s not the same case for run_lstm.py
2) they are not completely random, they use gaussian random generator with zero mean and variance based on input size.
https://www.random.org/gaussian-distributions/ (something like this)3) drop-out is coded in run_lstm.py with variable “dropout_rate” — by default set to 0 — you can add this variable to your configuration file under “Architecture” and fine-tune.
You can’t use same configuration file for both acoustic and duration model.
Please check below file for duration model configuration:
https://svn.ecdf.ed.ac.uk/repo/inf/dnn_tts/configuration/duration_configfile.confPlease check below file for acoustic model configuration:
https://svn.ecdf.ed.ac.uk/repo/inf/dnn_tts/configuration/acoustic_configfile.confwarmup_epoch is usually set at 10. The learning rate remains same as the value you have set until warmup_epoch (i.e., 10) and then the learning rate becomes half by every epoch thereafter.
If warmup_epoch is used, then the number of epochs can be set any value — and then the network training stops after certain number of epochs, if validation error is not improving.
You are not using warmup_epoch? why?
At the moment, the graph shows that the training may lead to over fitting if you extend beyond 20 epochs.
Yes, it is some what tuned for STRAIGHT (in fact, it started with STRAIGHT and extended to WORLD) and the parameters differ w.r.t sampling frequency of waveform generation.
We are measuring MCD before applying the filter. As the post-filtering results are stored in a different file with extension ‘.p_mgc’
There is no need to specify dimension to view the file, just use below command:
$SPTK/bin/x2x +fa norm_info…MVN.dat > temp.txt‘temp.txt’ file then contains exactly double the number of values as the dimension shown within the filename.
First N values represent means and the next N values represent variances.
In your case, it should have 500 values. To extract the mean and variance of 250th unit, you need to consider 250, 500th lines for mean and variance respectively.
you should use bottleneck model first to generate bottleneck features and append them to input and then use the second model to synthesize speech.
If the number of frames in the input file and output file doesn’t match, the same error appears along with another error indicating a mismatch in the frame length. So, please check other errors as well if appeared any.
Please attend the upcoming session on Merlin where I’ll explain step-by-step procedure to implement/debug duration modelling.
Answer to Q1: run_lstm.py has a well-defined variable for additional input (appended_input_dim) but run_dnn.py is used to be hard-coded for the combined input dimension. The output remains same: mgc with 60 and dmgc with 180.
Answer to Q2: At the moment, the bottleneck code is not completely implemented within Merlin. Yes, we need to have different folders: one for the label input and one for the bottleneck. The stacking of features is done outside Merlin, using independent scripts.
There are some conflicting things here….I’ll explain them in detailed during my tutoring session on this Friday.
1. “appended_input_dim : 512” is a variable that can be used only with run_lstm.py but not with “run_dnn.py”.
2. run_dnn_bottleneck.py is an old version of the code but works similar to run_dnn.py (doesn’t work with LSTM architectures)As I can see, you haven’t configured the SPTK tool path in the configuration file, as clearly shown in the error:
“/this/path/does/not/exist/x2x +af *”
subprocess: stderr: /bin/sh: /this/path/does/not/exist/x2x: No such file or directoryPlease change the path to SPTK tools in configuration files and re-run the GENWAV step.
check for a if condition,which says “save the model only after epoch 5”. comment out that line thereby enabling save model at every epoch.
if epoch > 5: ## comment this line and use small learning rate.
Also, can you paste the validation and training errors somewhere in pastebin and give the link here.
I am not sure about lexicon implementation in Ossian.
But, instead of modifying the existing full-contextual label file, you can exclude any questions (remove those lines from question file) which you don’t want to be part of the training in either HMM/DNN.
If you want to run the bottleneck features, you can do it without that import. So, please do comment all those imports.
The bottleneck system has to be trained first and then use the appended_input_dim as bottleneck dimension for the second DNN.
-
AuthorPosts