I think it’s the first one – if the test sentence is in the train data, there’s an exact match where no “new” joins need to be performed, which has a cost of 0. The problem that Taylor is discussing is that while this is a useful metric for some things, since it’s just a playback of human speech it doesn’t give a situation where we can compare synthesized speech with recorded speech.
Later in that section Taylor talks about keeping a test set of the input utterances, and synthesizing those sentences from the training (remainder) of the data. With this, you can compare the natural speech and hopefully-close-to-identical synthesized speech, as opposed to the natural speech and… the same natural speech.
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in