Forum Replies Created
-
AuthorPosts
-
That calculation is hardcoded in the C++ (because the time taken for calculating join costs pretty much dominates over everything else in terms of runtime).
It wouldn’t actually be that hard to change it – the code responsible in file
$FESTIVALDIR/src/modules/MultiSyn/EST_JoinCost.h
But, you would of course have to recompile festival to make your own modified executable that would have any changes you might want to make – *that* could be more or less time-consuming, depending on prior experience with C++ and compiling festival etc.
What was the secret to getting it working in the end? 😀
One other thing to add – some have suggested you need to use https:// links to the wave files, wherever you put them, in your Qualtrics test rather than http:// links
I’d be interested to hear whether + how you get this working!
I’m not a Praat user at all, so I cannot offer any advice on that I’m afraid.
With wavesurfer it is pretty easy though. Once you have converted your pitchmarks to the label file format (e.g. make_pmlab_pm pm/*.pm), just open Wavesurfer and load the wave file you want to view. You can either:
1) Just choose “Transcription” as the configuration option when initially opening the wave file. Wavesurfer will show the spectrogram and a transcription pane. To load the labels, right click in the transcription pane and select “load labels” and navigate to the correct label file.
2) Add labelling to any open view by right clicking to create another pane – choose Transcription. Then right click in that new pane to load the labels as above.
Tips: 1) if the “.lab” file is in the same directory as the corresponding .wav file, Wavesurfer will just load the labels automatically; 2) right click on the transcription pane and choose “properties” – you can then select to “Extend boundaries into waveform and spectrogram panes”, which can make label viewing better.
Did you remember to rsync the extra files (~1GB) from the AT Lab machines to your virtual machine? The manifest.txt for that includes the tools and data for the Speech Synthesis course.
For the full details see: https://speech.zone/courses/speech-processing/module-3-speech-synthesis-front-end-1/tutorial-b/
Are you encountering this problem when using the ATLab virtual machine image, or remote desktop access to the AT lab machines?
If it is the latter, I have also noticed some filesystem glitches in the past few days (I was unable to run festival) – I reported it to is.helpline@ed.ac.uk and they seemed to fix it.
Anyway, taking ch_wave for example, you should be able to find it here:
[korin@PPLS_ATL_0011 ~]$ which ch_wave
/Volumes/Network/courses/ss/festival/festival_linux/speech_tools/bin/ch_waveFinding which part of that path is missing will indicate exactly what the problem is.
“Found data” is anything you can get your hands on which was created for another purpose. For example, using YouTube videos for training a speech synthesis model would be using “found data”. It contrasts with data that has been purpose-designed and recorded specifically for building a speech synthesis voice.
Each voice has a scheme file that contains code to define how to set up the voice (e.g. which lexicon to use, where data files are, what data to load, etc…)
In the case of the SpeechSynthesis assignment, we’ve created a voice definition file which makes a voice out of data found in the current working directory (so you have to be in the directory to run any particular voice you build).
However, the extra step that’s needed is to register any voice with festival, so it knows that voice is available. You can do that for example by putting the voice definition in a standard place in the $FESTIVAL/lib directory. Alternatively, you can use the function “voice-location-multisyn” to register a voice that is found in a non-standard place. (see $FESTIVAL/lib/voices.scm for details on that)
1. For the ASF features – pitch, power and duration are mentioned (“…each target phoneme has a target pitch, power and duration”)
2. Yes, place of articulation is indeed a linguistic (i.e. articulatory phonetic) feature – it’s a property of a *phone*, which is a linguistic *concept* rather than a physically measurable signal, for example. In contrast, the f0 or power or duration used as ASF feature are something you can directly observe and measure in the acoustic signal (or derivations thereof).
3. Yes, when combining different subcosts (e.g. ASF and/or IFF ones) we would typically want to weight them, so we can balance their influence in the overall cost.
To change the default behaviour of inserting silence when a missing diphone occurs across a word boundary, you would need to edit the C++ code and recompile festival (i.e. IIRC, this behaviour is hard-coded in C++ and not accessible in Scheme).
Yes, just 1 frame either side.
(The code is the ultimate documentation, and the code definitely says 1 frame!)
Yes, Simon’s right, this indicates you’ve done something like use the wrong lexicon at some stage.
Incidentally, you probably won’t find *_cl phones in the my_lexicon.scm file.
Actually, FYI, the *_cl symbols are only used in the force alignment process. Stops (e.g. p, b, t, k…) are broken into two parts, one for the closure portion (e.g. p_cl, b_cl, t_cl, k_cl…) and one for the release. When you build utterances from the final MLF file, these are merged back together again in Festival’s Segment relation, but the boundary between them is used to record the diphone boundary join point…
(I’m not sure which lexicon could have a Q though! Unless it stands for glottal stop…)
Does the script run at all? Is there a particular sentence it breaks on? Or perhaps, can you provide a minimal example of code that exhibits the problem?
“SIOD ERROR: wrong type of argument to get_c_val” is a rather generic error – it could be cropping up in a large number of ways – it just means some function is receiving an argument that is different from what it expects. So it’s impossible to tell what’s going wrong without more information.
Thanks + regards,
Korin
Don’t worry, you are only changing the backoff rules data structure in memory of the currently running festival process – nothing is permanently changed. And you can easily reinstate the rules for the current festival session by running the above command, or to be specific
(du_voice.setDiphoneBackoff currentMultiSynVoice (append ‘((ii @)) unilex-edi-backoff_rules) )
unilex-edi-backoff_rules
is just a list variable – you can examine its contents by just putting the variable name at the command prompt:festival> unilex-edi-backoff_rules
then hit return.
Three thoughts come to mind:
i) yes, it seems the default backoff rules don’t include ii -> @
I don’t expect that’s for any good reason – probably just an oversight. You can try adding this to the list of rules your currently loaded Multisyn voice is using by entering this at the festival command prompt:(du_voice.setDiphoneBackoff currentMultiSynVoice (append '((ii @)) unilex-edi-backoff_rules) )
Does that make the phone-based backoff work more as you want it?
ii) the backoff phone list is just one way that Multisyn offers for dealing with missing diphones. It will also make phone joins instead if desired. For example, if you simply delete the list of phone backoff rules by doing this:
(du_voice.setDiphoneBackoff currentMultiSynVoice nil)
then Festival won’t be able to back off any phones, and so will do a phone join instead. Arguably, I think in the majority of cases this could be a better strategy than the default backoff rule which is to back off to silence (which seems it could be a terrible idea!). I’d be interested in how that works in your case – can you try and post back please? 🙂
(Or you could even explore that further in your write-up?…)iii) actually, it seems odd that you don’t have a diphone to handle “the” before a vowel. “The” is obviously a really common word! Is it really the case you don’t have a single “the” word before a word starting with a vowel in your voice dataset? That may be indicative of a problem. Or, of course, you may have selected a “peculiar” set of domain-specific data which doesn’t have it…?
-
AuthorPosts