Synthesis using bottlenecks

This topic has 3 replies, 3 voices, and was last updated 8 years, 8 months ago by Simon.

Viewing 3 reply threads

Author

Posts
- July 2, 2016 at 19:36 #3315
  Giorgia M
  Student
  Hello,
  
  We now have trained our acoustic model using bottleneck features! We were all very happy until we tried to synthesise unseen sentences. The problem is that there is a difference in dimensionality of the input vector and the trained weights. The weights dimensions include bottleneck length while the input doesn’t.Hence the below error is returned:
  
  File “/afs/inf.ed.ac.uk/user/s15/s1566512/ossian_msc_2016_test/Ossian/scripts/processors/NN.py”, line 129, in predict
  input = numpy.dot(input, layer[‘W’])
  ValueError: shapes (23,178) and (274,1024) not aligned: 178 (dim 1) != 274 (dim 0)
  
  23 is the vector for each state of the input file
  178 is the length of features (coming from the question file)
  
  274 is the length of the features + bottlenecks
  1024 is the size of the hidden layer
  
  How do we solve this? Do the bottleneck need to be appended as well at synthesis time?
  
  Thanks.
- July 3, 2016 at 12:20 #3318
  Srikanth R
  Student
  you should use bottleneck model first to generate bottleneck features and append them to input and then use the second model to synthesize speech.
- July 3, 2016 at 12:38 #3319
  Giorgia M
  Student
  Hi Srikanth, yes, I believe this is what we did.
  
  Step 1. Extract bottleneck features
  Step 2. Append features to input and train new model. The error is lower than the system without bottlenecks, so it looks good.
  Step 3. Trying to synthesise using weight matrices optimised during step 2.
  
  The problem is that the weight matrices from step 2 are longer than the input that we use in step 3. The weight matrix has the length of the input + bottlenecks. But the input at synthesis time has the dimensionality of the input alone.
  
  So the input now is a matrix (23,178) and the weight matrix is (274,1024). Cannot do the dot product.
- July 10, 2016 at 10:38 #3324
  Simon
  Professor
  You need to append the bottleneck features at synthesis time too. So, synthesis will also involve a forward pass through the bottleneck network, saving those bottleneck features, appending them to the usual input features, then passing this concatenated vector through the second network.
Author

Posts

Viewing 3 reply threads

You must be logged in to reply to this topic.

Synthesis using bottlenecks

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis