Thank you for your response. I am struggling to get what is explicitly meant by the ‘analysing wavelets’.
Basically, the question I am asking is how do we firstly translate the linguistic features into acoustic feature values?
Thinking about it though, factoring deltas surely still means that the join cost is still very local. (i.e. only factoring the gradients on either sides of the join)?
In Chapter 8.2 (Page 294), the book explains a letter-to-phone alignment algorithm. I understand this is implemented on training data (as well as test data) for a probabilistic g2p algorithm. However, the doesn’t describe this particular algorithm in detail.
How does the algorithm find all alignments between the pronunciation and the spelling (which conforms to the allowable phones)? Could you provide a concrete example of this?
That is starting to make sense. However, struggling to understand what you mean by a base frequency and why do we give this 1/T to the signal?
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in