› Forums › Speech Synthesis › Festival › Bulk processing of text in Festival
- This topic has 12 replies, 7 voices, and was last updated 4 years, 5 months ago by Elijah G.
-
AuthorPosts
-
-
February 7, 2018 at 14:29 #9054
I want to put my text prompts through Festival front-end and then print the utterance relations of interest to me (like unit and syllable structure), and process it. But inside festival everything “happens” in scheme. How can I output it to Python or to Shell terminal? More generally, how can I write shell and scheme or shell and python in one script? Do I just call it in the midst of the terminal and then pass the variable to it? As to exporting its output, do we bracket the scheme commands and then use “>>” to save it in a variable or a file?
-
February 9, 2018 at 13:32 #9055
Hi,
I’ve cobbled together a couple of scripts which will help you by providing a model for performing synthesis for a file of sentences (1 to >100000s…) and then saving various bits of information. Have a look at the zip file attached. It contains:
i) a text file with a few sentences for demo purposes
ii) a scheme file with the necessary code to make festival do this
iii) a bash file which acts as a “driver” script – responsible for selecting, starting and configuring festival, and then instructing it how to process the text file…Have a look at the three files and see if you can work out what is going on for yourself! You will need to edit at least the doSynthesis.sh bash script to tailor it for your voice build.
Using this model you should be able to do things like:
a) use festival to do front-end analysis for a large amount of text, converting the resulting linguistic data in the utterance structures into a “flat text” representation and printing that – you could then pipe that to a file and write a Python script to process it in order to experiment with text selection algorithms
b) use festival to do either front-end analysis or full synthesis and save utterance structures, selected relations, waveforms etc. to file…
I hope that helps!
Korin
Attachments:
You must be logged in to view attached files. -
February 22, 2018 at 22:27 #9069
I obtain the following error when running the code (and I can’t figure out why)
SIOD ERROR: wrong type of argument to get_c_val
What am I doing wrong?
-
February 26, 2018 at 11:17 #9077
Can you provide more detail about where exactly this error comes up please? Can you get a backtrace, for example? Without more detail, it’s impossible to guess where+why you get this error.
Thanks! Korin
-
February 27, 2018 at 10:58 #9079
How do I get a backtrace? In festival’s output there are no signs of error, it just stops at some point when this error pops up.
-
February 27, 2018 at 14:20 #9085
Does the script run at all? Is there a particular sentence it breaks on? Or perhaps, can you provide a minimal example of code that exhibits the problem?
“SIOD ERROR: wrong type of argument to get_c_val” is a rather generic error – it could be cropping up in a large number of ways – it just means some function is receiving an argument that is different from what it expects. So it’s impossible to tell what’s going wrong without more information.
Thanks + regards,
Korin
-
March 6, 2018 at 16:44 #9135
I got the script runnign al-righty and it returned a list of flat representation for the utterances I gave it. I did not ask for the specific diphone, since the representation of diphones in the “unit” part of utt.relation shows some diphones that are not found in the waveform corpora (depending the voice that you used to initiate the festival, AWB, your own voice, or just festival) in their back-up forms. But depending the voice that you used to initiate the Festival, the output interface is quite different. I used several of them, along with
- grep
command in shell scripts before obtaining the flat utterance representation that I desired, as shown in the first picture below. But then I still have to further text-process this form to obtain the diphone form, and even more if I want to preserve the stress information (anyone has any idea how to preserve the stress tag in flat representation and include it as part of the diphone?). The utterances that I have passed to it are in the forms as shown in the second image (lined up by sentences, and filtered to preserve only utterances of between 5 and 15 words).
Attachments:
You must be logged in to view attached files. -
March 8, 2018 at 11:46 #9139
I can’t open those files which showed damaged. Can you upload them again?
-
March 8, 2018 at 12:42 #9140
This will probably be because of Apple’s over-strict security settings. The files are not damaged. Try downloading them in a browser other than Safari, or on a different computer (not in the lab).
-
-
February 22, 2020 at 15:18 #10681
I got the “SIOD ERROR: wrong type of argument to get_c_val” with no backtrace when the script got to lines of my data that started with ! or ). Removing those characters from the start of lines allowed me to run the script.
-
March 10, 2020 at 14:46 #10714
Hi,
Is it possible to use the doSynthesis.sh script with my_lexicon.scm so that the front-end processes the sentences with the correct pronunciations for OOV words? I tried running the command “festival -b $MBDIR… ./my_lexicon.scm ‘(list (voice_localdir_multisyn-gam)…” but it gave me an error after processing a portion of the sentences. I made sure to source setup.sh before running it.
Thanks.
-
March 12, 2020 at 08:49 #10715
Please include the complete command line you are running, and the full error message, so I can help you.
-
-
March 12, 2020 at 16:23 #10718
Thanks for your reply. The issue’s now been resolved in the lab.
-
-
AuthorPosts
- You must be logged in to reply to this topic.