› Forums › Speech Synthesis › Festival › SABLE Markup tags
- This topic has 16 replies, 2 voices, and was last updated 8 years, 11 months ago by Joseph M.
-
AuthorPosts
-
-
February 9, 2016 at 20:58 #2565
The manual discusses the SABLE markup language, and mentions a variety of possible tags one could use. I am particularly interested in this tag (from the manual):
SPEAKER
Select a voice. Accepts a parameter NAME which takes values male1, male2, female1, etc. There is currently no definition about what happens when a voice is selected which the synthesizer doesn’t support. An example is
<SPEAKER name=”male1″> … </SPEAKER>Can this be used to change voices (between custom voices we have built), in the middle of a text? If so, how would that work? Would it be fast enough to be real time?
-
February 10, 2016 at 12:02 #2566
Yes, this will change between voices. The format of the name of a voice is the same that you would use within Festival, minus the “voice_” prefix. Try creating a file called
test.sable
(make sure the suffix is .sable and that your editor doesn’t add another suffix) with these contents:Changes of speaker may appear in the text. Using one speaker Eventually returning to the original default speaker.and run it through Festival like this
bash$ festival --tts test.sable
Note that SABLE was a putative standard developed a long time ago by us in Edinburgh with a few companies. It has been superseded. See also the earlier standard SSML and the related standard for interactive systems, VoiceXML.
-
February 10, 2016 at 13:07 #2568
So..if VoiceXML is the standard for interactive systems, is there another standard for purely TTS systems? You say SABLE has been ‘superseded’. By what? And will that standard work on Festival?
In general, this seems like a HIGHLY relevant area to our course of study. Will this be covered at all during the SS course? If not, I’d like to put in a request for either one of your amazing ‘extra’ videos, or one of your amazing ‘extra’ lectures!!
-
February 10, 2016 at 14:24 #2574
It was superseded in the sense that it never made it as far as a formalised standard (e.g., via the W3C) and instead we have various vendor-specific approaches (e.g., Microsoft’s SAPI5).
SAPI 5 synthesis markup format is similar to the format published by the SABLE Consortium. However, this format and SABLE version 1.0 are not interoperable. At this time, it’s not determined if they will become partially interoperable in the future. (SAPI 5.3 documentation, Microsoft)
-
February 18, 2016 at 08:16 #2603
Here’s another example of a proprietary markup language: Neospeech’s VTML.
-
-
-
February 10, 2016 at 13:17 #2569
I tried your example from above, and here’s what the terminal returned:
ppls-atlab-017:ss s1567647$ festival –tts test.sable
Error: Expected name, but got <space> after <
in unnamed entity at line 1 char 2 of file:/var/folders/0h/9r06nc9x49q8b8rlczy9nhk401jk1t/T//est_01054_00000
festival: text modes, caught error and tidying upSuggestions?
-
February 10, 2016 at 14:03 #2571
This might be because you did a cut-and-paste from this webpage, which picked up HTML versions of some characters?
-
-
February 10, 2016 at 13:40 #2570
OK, I actually got it to work by using the example straight out of the manual (not sure exactly what is different about that from your script). A couple things:
1. Not all of the tags seem to work.
2. The voice switching DOES work – hooray!! Now, on to the next problem: If I’m using ‘localdir’ to identify my voice, how could I switch that to one of my other voices, which is in another directory? (I’ve put each voice in its own directory – seems like the only way to go, especially since they don’t share the same wav files). Suggestions?
-
February 10, 2016 at 14:04 #2572
1. some tags need to be supported by the voice
why will unit selection voices generally not support tags that modify pitch, duration, emphasis, etc ?
-
February 10, 2016 at 14:15 #2573
2. a little fiddly, but possible
A voice in Festival is defined by a set of files, including some Scheme that defines the locations of the various files needed (LPCs, utts, etc). On the system here in Edinburgh, the definition of voice_localdir_multisyn-gam is here:
/Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/english/localdir_multisyn-gam
This is a little different to normal voices, in that it looks in the current (local) directory for the voice files, and not in Festival’s own library.
You will need to make your own a copy of that directory and its contents, and modify file
festvox/localdir_multisyn-gam.scm
to change the name of the voice (you can’t have two voices with the same name), and the paths used to find the voice data.See how far you get, then come back for more help if you need it.
-
February 10, 2016 at 16:21 #2575
Ok, getting a bit stuck. I’ve located and copied the localdir_multisyn-gam.scm, opened it and edited it to change the data paths, and changed its name to voice1_multisyn-gam.scm. Now, the simple thing to do would be to just put it back in the festvox directory that it came from, but of course i don’t have permissions access to do that. Now, you say “make your own a copy of that directory”, ok fine, I can do that, but then how will I tell festival to find that new directory? Doesn’t festival know where to find everything it needs based off of this script: festival>(voice_localdir_multisyn-gam)? So wouldn’t I need to edit THAT script to point it to the new directory I would make? That was my reasoning, which sent me looking for the voice_ script…and I can’t find it anywhere on the network.
Help? -
February 10, 2016 at 19:08 #2577
something to do with this, presumably:
The variable voice-path conatins a list of directories where voices will be automatically searched for. If this is not set it is set automatically by appending `/voices/’ to all paths in festival load-path. You may add new directories explicitly to this variable in your `sitevars.scm’ file or your own `.festivalrc’ as you wish.i tried this:
(voice-location NAME DIR DOCSTRING)
Record the location of a voice. Called for each voice found on voice-path. Can be called in site-init or .festivalrc for additional voices which exist elsewhere.but that didn’t work. clearly, i need to tell festival where my voice(s) are. the manual implies that this is a simple process, but i can’t get it work.
-
February 11, 2016 at 20:25 #2582
For unit selection voices that use the multisyn engine, You need to set
voice-path-multisyn
and notvoice-path
.
-
-
February 10, 2016 at 19:13 #2578
i can get SABLE to switch between voices that are not in the same directory, so clearly it can be done. i just need to know how to “declare the new voice to Festival”, or add its directory to the voice-path….
-
February 11, 2016 at 20:22 #2580
Try this command
festival> (set! voice-path-multisyn "/Path/To/Your/Voice/Directory/")
and Festival will look in the location you specify, instead of the system location /Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/
So, whatever directory (in your own filespace) you set
voice-path-multisyn
to, it should have the same structure as the system directory above, which is:If that works, you could create a file called
.festivalrc
in your home directory (or editing the existing file, if you have one) that contains the command, so it is executed every time you start Festival:(set! voice-path-multisyn "/Path/To/Your/Voice/Directory/")
Hint: the Finder on a Mac will hide files whose name starts with a period, by default. Use the Terminal to see all files, where you need to use
ls -a
rather than justls
-
February 12, 2016 at 10:33 #2583
OK, here’s what should be a much simpler way. Create versions of localdir_multisyn-gam.scm for each of your voices. Start Festival, and manually load each such voice definition
festival> (load ".../mylocaldirvoice1_multisyn-gam.scm") nil festival> (load ".../mylocaldirvoice2_multisyn-gam.scm") nil
where you should replace “…” with the absolute path to where you have placed those files. Now you can use those voices
festival> (voice_mylocaldirvoice1_multisyn-gam) Please wait: Initialising multisyn voice. ...etc. festival> (SayText "This is the first voice") ...etc. festival> (voice_mylocaldirvoice2_multisyn-gam) Please wait: Initialising multisyn voice. ...etc. festival> (SayText "This is the second voice")
And to do the whole thing from the beginning and then synthesise from some text with SABLE markup, start Festival and do:
festival> (load ".../mylocaldirvoice1_multisyn-gam.scm") nil festival> (load ".../mylocaldirvoice2_multisyn-gam.scm") nil festival> (tts "somefile.sable" nil)
-
February 25, 2016 at 12:07 #2636
This is from the IBM/Watson current website, where they are offering Watson’s TTS and ASR capabilities as a cloud service:
https://text-to-speech-demo.mybluemix.net/
They have a demo that lets you experiment with ‘Expressive SSML’. For example, this paragraph:
<speak>I have been assigned to handle your order status request.<express-as type=”Apology”> I am sorry to inform you that the items you requested are back-ordered. We apologize for the inconvenience.</express-as><express-as type=”Uncertainty”> We don’t know when those items will become available. Maybe next week but we are not sure at this time.</express-as><express-as type=”GoodNews”>Because we want you to be a happy customer, management has decided to give you a 50% discount! </express-as></speak>
There is a lot of documentation about SSML on their site, regarding what kind of tags it does and doesn’t support, etc. But they don’t divulge, and therefore what my question is, is this: in the example above, where they are switching voice ‘type’ (and it does sound somewhat convincing), are they doing what I think they are doing, and what I have been experimenting with in the lab: are they switching to an entirely different voice that has been built from a different recorded database (same speaker, same script) recorded with these different ‘expressive’ qualities? Or, is this an example of a parametric or hybrid system, and these different voice ‘type’ expressive qualities are being imposed on the synthesis parametrically?
Also: earlier in this thread you implied that SSML was an older markup language, yet IBM still seems to be using it. Thoughts on why they’ve stuck with it? Is SSML open source? Do they perhaps have another proprietary XML that they keep for their own high-end products, like Watson when he appears on game shows, etc?
-
-
AuthorPosts
- You must be logged in to reply to this topic.