› Forums › Speech Synthesis › Merlin › Synthesis using bottlenecks
- This topic has 3 replies, 3 voices, and was last updated 8 years, 6 months ago by Simon.
-
AuthorPosts
-
-
July 2, 2016 at 19:36 #3315
Hello,
We now have trained our acoustic model using bottleneck features! We were all very happy until we tried to synthesise unseen sentences. The problem is that there is a difference in dimensionality of the input vector and the trained weights. The weights dimensions include bottleneck length while the input doesn’t.Hence the below error is returned:
File “/afs/inf.ed.ac.uk/user/s15/s1566512/ossian_msc_2016_test/Ossian/scripts/processors/NN.py”, line 129, in predict
input = numpy.dot(input, layer[‘W’])
ValueError: shapes (23,178) and (274,1024) not aligned: 178 (dim 1) != 274 (dim 0)23 is the vector for each state of the input file
178 is the length of features (coming from the question file)274 is the length of the features + bottlenecks
1024 is the size of the hidden layerHow do we solve this? Do the bottleneck need to be appended as well at synthesis time?
Thanks.
-
July 3, 2016 at 12:20 #3318
you should use bottleneck model first to generate bottleneck features and append them to input and then use the second model to synthesize speech.
-
July 3, 2016 at 12:38 #3319
Hi Srikanth, yes, I believe this is what we did.
Step 1. Extract bottleneck features
Step 2. Append features to input and train new model. The error is lower than the system without bottlenecks, so it looks good.
Step 3. Trying to synthesise using weight matrices optimised during step 2.The problem is that the weight matrices from step 2 are longer than the input that we use in step 3. The weight matrix has the length of the input + bottlenecks. But the input at synthesis time has the dimensionality of the input alone.
So the input now is a matrix (23,178) and the weight matrix is (274,1024). Cannot do the dot product.
-
July 10, 2016 at 10:38 #3324
You need to append the bottleneck features at synthesis time too. So, synthesis will also involve a forward pass through the bottleneck network, saving those bottleneck features, appending them to the usual input features, then passing this concatenated vector through the second network.
-
-
AuthorPosts
- You must be logged in to reply to this topic.