ASF – translating linguistic features to acoustic representation

This topic has 2 replies, 2 voices, and was last updated 6 years, 1 month ago by Simon.

Viewing 2 reply threads

Author

Posts
- February 7, 2019 at 14:31 #9684
  Alex S
  Student
  In ASF do we typically view, for example, cepstrum coefficient vectors as the acoustic representation of the linguistic features.
  Do this mean that, as mentioned in Taylor Chapter 16.4.1 and .2, that we use these vectors in either the HMM method or decision-tree clustering to extract the distribution of observed feature combinations (e.g. stress and phrase finality?)
- February 7, 2019 at 14:37 #9685
  Alex S
  Student
  Basically, the question I am asking is how do we firstly translate the linguistic features into acoustic feature values?
- February 8, 2019 at 08:49 #9686
  Simon
  Professor
  Predicting acoustic features from linguistic features is a regression problem. We already have the necessary labelled training data: the speech database that will be used for unit selection.
  
  One way to do the regression would be to train a regression tree (a CART). This is the method used in so-called “HMM-based speech synthesis” that we will cover in the second half of the course. But in HMM synthesis, the predicted acoustic features are used as input to a vocoder to create a waveform, rather than in an ASF target cost function.
  
  We might then replace the tree with a better regression model: a neural network. We’ll cover this method after HMM synthesis.
  
  Once we know about HMM and neural network speech synthesis (both using vocoders rather than unit selection + waveform concatenation), we can then come back to the ASF formulation of unit selection. We will find that this is usually called “hybrid speech synthesis” and is covered towards the end of the course.
Author

Posts

Viewing 2 reply threads

You must be logged in to reply to this topic.

ASF – translating linguistic features to acoustic representation

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis