- This topic has 1 reply, 2 voices, and was last updated 8 years ago by .
Viewing 1 reply thread
Viewing 1 reply thread
- You must be logged in to reply to this topic.
› Forums › Speech Synthesis › Merlin › Training details
Hi, I want to double check some details about the implementation and the code:
1) I saw this in “dnn.py”, are we actually using it like this?:
##top 2 layers use a smaller learning rate
##hard-code now, change it later
if layer_size > 4:
for i in range(layer_size-4, layer_size):
lr_list[i] = learning_rate * 0.5
2) Is the initialization of the weights completely random?
3) Are we using any drop-out value?
Thanks!
1) Yes, if you use run_dnn.py — then it automatically uses 0.5 times the learning rate for top hidden layers (>4). It’s not the same case for run_lstm.py
2) they are not completely random, they use gaussian random generator with zero mean and variance based on input size.
https://www.random.org/gaussian-distributions/ (something like this)
3) drop-out is coded in run_lstm.py with variable “dropout_rate” — by default set to 0 — you can add this variable to your configuration file under “Architecture” and fine-tune.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in