SABLE Markup tags

This topic has 16 replies, 2 voices, and was last updated 9 years, 3 months ago by Joseph M.

Viewing 12 reply threads

Author

Posts
- February 9, 2016 at 20:58 #2565
  Joseph M
  Student
  The manual discusses the SABLE markup language, and mentions a variety of possible tags one could use. I am particularly interested in this tag (from the manual):
  
  SPEAKER
  Select a voice. Accepts a parameter NAME which takes values male1, male2, female1, etc. There is currently no definition about what happens when a voice is selected which the synthesizer doesn’t support. An example is
  <SPEAKER name=”male1″> … </SPEAKER>
  
  Can this be used to change voices (between custom voices we have built), in the middle of a text? If so, how would that work? Would it be fast enough to be real time?
- February 10, 2016 at 12:02 #2566
  Simon
  Professor
  Yes, this will change between voices. The format of the name of a voice is the same that you would use within Festival, minus the “voice_” prefix. Try creating a file called test.sable (make sure the suffix is .sable and that your editor doesn’t add another suffix) with these contents:
```
Changes of speaker may appear in the text.
Using one speaker
Eventually returning to the original default speaker.
```
  and run it through Festival like this
```
bash$ festival --tts test.sable
```
  Note that SABLE was a putative standard developed a long time ago by us in Edinburgh with a few companies. It has been superseded. See also the earlier standard SSML and the related standard for interactive systems, VoiceXML.
- February 10, 2016 at 13:07 #2568
  Joseph M
  Student
  So..if VoiceXML is the standard for interactive systems, is there another standard for purely TTS systems? You say SABLE has been ‘superseded’. By what? And will that standard work on Festival?
  
  In general, this seems like a HIGHLY relevant area to our course of study. Will this be covered at all during the SS course? If not, I’d like to put in a request for either one of your amazing ‘extra’ videos, or one of your amazing ‘extra’ lectures!!
  - February 10, 2016 at 14:24 #2574
    Simon
    Professor
    It was superseded in the sense that it never made it as far as a formalised standard (e.g., via the W3C) and instead we have various vendor-specific approaches (e.g., Microsoft’s SAPI5).
    
    SAPI 5 synthesis markup format is similar to the format published by the SABLE Consortium. However, this format and SABLE version 1.0 are not interoperable. At this time, it’s not determined if they will become partially interoperable in the future. (SAPI 5.3 documentation, Microsoft)
    - February 18, 2016 at 08:16 #2603
      Simon
      Professor
      
      Here’s another example of a proprietary markup language: Neospeech’s VTML.
- February 10, 2016 at 13:17 #2569
  Joseph M
  Student
  I tried your example from above, and here’s what the terminal returned:
  
  ppls-atlab-017:ss s1567647$ festival –tts test.sable
  Error: Expected name, but got <space> after <
  in unnamed entity at line 1 char 2 of file:/var/folders/0h/9r06nc9x49q8b8rlczy9nhk401jk1t/T//est_01054_00000
  festival: text modes, caught error and tidying up
  
  Suggestions?
  - February 10, 2016 at 14:03 #2571
    Simon
    Professor
    This might be because you did a cut-and-paste from this webpage, which picked up HTML versions of some characters?
- February 10, 2016 at 13:40 #2570
  Joseph M
  Student
  OK, I actually got it to work by using the example straight out of the manual (not sure exactly what is different about that from your script). A couple things:
  
  1. Not all of the tags seem to work.
  
  2. The voice switching DOES work – hooray!! Now, on to the next problem: If I’m using ‘localdir’ to identify my voice, how could I switch that to one of my other voices, which is in another directory? (I’ve put each voice in its own directory – seems like the only way to go, especially since they don’t share the same wav files). Suggestions?
- February 10, 2016 at 14:04 #2572
  Simon
  Professor
  1. some tags need to be supported by the voice
  
  why will unit selection voices generally not support tags that modify pitch, duration, emphasis, etc ?
- February 10, 2016 at 14:15 #2573
  Simon
  Professor
  2. a little fiddly, but possible
  
  A voice in Festival is defined by a set of files, including some Scheme that defines the locations of the various files needed (LPCs, utts, etc). On the system here in Edinburgh, the definition of voice_localdir_multisyn-gam is here:
  
  /Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/english/localdir_multisyn-gam
  
  This is a little different to normal voices, in that it looks in the current (local) directory for the voice files, and not in Festival’s own library.
  
  You will need to make your own a copy of that directory and its contents, and modify file festvox/localdir_multisyn-gam.scm to change the name of the voice (you can’t have two voices with the same name), and the paths used to find the voice data.
  
  See how far you get, then come back for more help if you need it.
- February 10, 2016 at 16:21 #2575
  Joseph M
  Student
  Ok, getting a bit stuck. I’ve located and copied the localdir_multisyn-gam.scm, opened it and edited it to change the data paths, and changed its name to voice1_multisyn-gam.scm. Now, the simple thing to do would be to just put it back in the festvox directory that it came from, but of course i don’t have permissions access to do that. Now, you say “make your own a copy of that directory”, ok fine, I can do that, but then how will I tell festival to find that new directory? Doesn’t festival know where to find everything it needs based off of this script: festival>(voice_localdir_multisyn-gam)? So wouldn’t I need to edit THAT script to point it to the new directory I would make? That was my reasoning, which sent me looking for the voice_ script…and I can’t find it anywhere on the network.
  Help?
- February 10, 2016 at 19:08 #2577
  Joseph M
  Student
  something to do with this, presumably:
  The variable voice-path conatins a list of directories where voices will be automatically searched for. If this is not set it is set automatically by appending `/voices/’ to all paths in festival load-path. You may add new directories explicitly to this variable in your `sitevars.scm’ file or your own `.festivalrc’ as you wish.
  
  i tried this:
  (voice-location NAME DIR DOCSTRING)
  Record the location of a voice. Called for each voice found on voice-path. Can be called in site-init or .festivalrc for additional voices which exist elsewhere.
  
  but that didn’t work. clearly, i need to tell festival where my voice(s) are. the manual implies that this is a simple process, but i can’t get it work.
  - February 11, 2016 at 20:25 #2582
    Simon
    Professor
    For unit selection voices that use the multisyn engine, You need to set voice-path-multisyn and not voice-path.
- February 10, 2016 at 19:13 #2578
  Joseph M
  Student
  i can get SABLE to switch between voices that are not in the same directory, so clearly it can be done. i just need to know how to “declare the new voice to Festival”, or add its directory to the voice-path….
- February 11, 2016 at 20:22 #2580
  Simon
  Professor
  Try this command
```
festival> (set! voice-path-multisyn "/Path/To/Your/Voice/Directory/")
```
  and Festival will look in the location you specify, instead of the system location /Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/
  
  So, whatever directory (in your own filespace) you set voice-path-multisyn to, it should have the same structure as the system directory above, which is:
  
  If that works, you could create a file called .festivalrc in your home directory (or editing the existing file, if you have one) that contains the command, so it is executed every time you start Festival:
```
(set! voice-path-multisyn "/Path/To/Your/Voice/Directory/")
```
  Hint: the Finder on a Mac will hide files whose name starts with a period, by default. Use the Terminal to see all files, where you need to use ls -a rather than just ls
- February 12, 2016 at 10:33 #2583
  Simon
  Professor
  OK, here’s what should be a much simpler way. Create versions of localdir_multisyn-gam.scm for each of your voices. Start Festival, and manually load each such voice definition
```
festival> (load ".../mylocaldirvoice1_multisyn-gam.scm")
nil
festival> (load ".../mylocaldirvoice2_multisyn-gam.scm")
nil
```
  where you should replace “…” with the absolute path to where you have placed those files. Now you can use those voices
```
festival> (voice_mylocaldirvoice1_multisyn-gam)
Please wait: Initialising multisyn voice.
...etc.
festival> (SayText "This is the first voice")
...etc.
festival> (voice_mylocaldirvoice2_multisyn-gam)
Please wait: Initialising multisyn voice.
...etc.
festival> (SayText "This is the second voice")
```
  And to do the whole thing from the beginning and then synthesise from some text with SABLE markup, start Festival and do:
```
festival> (load ".../mylocaldirvoice1_multisyn-gam.scm")
nil
festival> (load ".../mylocaldirvoice2_multisyn-gam.scm")
nil
festival> (tts "somefile.sable" nil)
```
- February 25, 2016 at 12:07 #2636
  Joseph M
  Student
  This is from the IBM/Watson current website, where they are offering Watson’s TTS and ASR capabilities as a cloud service:
  
  https://text-to-speech-demo.mybluemix.net/
  
  They have a demo that lets you experiment with ‘Expressive SSML’. For example, this paragraph:
  
  <speak>I have been assigned to handle your order status request.<express-as type=”Apology”> I am sorry to inform you that the items you requested are back-ordered. We apologize for the inconvenience.</express-as><express-as type=”Uncertainty”> We don’t know when those items will become available. Maybe next week but we are not sure at this time.</express-as><express-as type=”GoodNews”>Because we want you to be a happy customer, management has decided to give you a 50% discount! </express-as></speak>
  
  There is a lot of documentation about SSML on their site, regarding what kind of tags it does and doesn’t support, etc. But they don’t divulge, and therefore what my question is, is this: in the example above, where they are switching voice ‘type’ (and it does sound somewhat convincing), are they doing what I think they are doing, and what I have been experimenting with in the lab: are they switching to an entirely different voice that has been built from a different recorded database (same speaker, same script) recorded with these different ‘expressive’ qualities? Or, is this an example of a parametric or hybrid system, and these different voice ‘type’ expressive qualities are being imposed on the synthesis parametrically?
  
  Also: earlier in this thread you implied that SSML was an older markup language, yet IBM still seems to be using it. Thoughts on why they’ve stuck with it? Is SSML open source? Do they perhaps have another proprietary XML that they keep for their own high-end products, like Watson when he appears on game shows, etc?
Author

Posts

Viewing 12 reply threads

You must be logged in to reply to this topic.

SABLE Markup tags

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis