I got the script runnign al-righty and it returned a list of flat representation for the utterances I gave it. I did not ask for the specific diphone, since the representation of diphones in the “unit” part of utt.relation shows some diphones that are not found in the waveform corpora (depending the voice that you used to initiate the festival, AWB, your own voice, or just festival) in their back-up forms. But depending the voice that you used to initiate the Festival, the output interface is quite different. I used several of them, along with
command in shell scripts before obtaining the flat utterance representation that I desired, as shown in the first picture below. But then I still have to further text-process this form to obtain the diphone form, and even more if I want to preserve the stress information (anyone has any idea how to preserve the stress tag in flat representation and include it as part of the diphone?). The utterances that I have passed to it are in the forms as shown in the second image (lined up by sentences, and filtered to preserve only utterances of between 5 and 15 words).
Some forums are only available if you are logged in. Searching will only return results from those forums if you log in.
Copyright © 2024 · Balance Child Theme on Genesis Framework · WordPress · Log in