Forum Replies Created
-
AuthorPosts
-
You might want to load the list of values from a file:
for X in `cat myfile.scp` do echo The value of X is ${X} done
where myfile.scp is a plain text file with one value per line. This is a good way to loop around a list of files, for example.
To loop around a range of numerical values, you can use
for X in {1..10} do echo The value of X is ${X} done
A more flexible way is to use the seq command which allows you to control the increment step size, use non-integer values, and control the format in which the number is printed
for X in $(seq 1 10) do echo The value of X is ${X} done
or
for X in $(seq -w 1 0.5 6) do echo The value of X is ${X} done
and so on. Type ‘man seq’ at a bash prompt to read the manual for the seq command.
The basic loop around a fixed set of values looks like this:
for X in 1 2 3 do echo The value of X is ${X} done
where the values are actually strings, so we can also have
for X in 34 b purple c 99 a do echo The value of X is ${X} done
or
for FRUIT in apples oranges pears do echo The current fruit is ${FRUIT} done
This is an internal feature and you don’t need to understand what it means. The key phrase-level feature on the example above is “NB”, meaning “no break”.
OK, so my hypothesis about non-ASCII characters is probably wrong here. You seem to have found a pretty bad error in the part of the pipeline that detects/classifies/expands non-standard words. Can you speculate on exactly where this might have happened, and maybe even propose where a change would have to be made to fix this problem?
The unknown / blank item in the Word relation is probably the place where the pound sign used to be just after tokenisation, but has been deleted after completion of the non-standard word processing step (because we don’t want “pounds three billion”).
This sounds like a unit selection error. The most likely explanation is that there is a unit in the speech database that is labelled as the vowel in “red” but actually sounds like the vowel in “reed”.
It’s easy to see how that might happen: there was a front-end error during the labelling of the database (e.g., the database utterance contained the word “read” pronounced as “reed” but the front end predicted the phone sequence for the pronunciation “red” and so aligned that phone label with the speech. Automatic labelling works well, but may not always be able to detect that type of error.
The unit selection algorithm is susceptible to mislabelling errors and has only limited ways of detecting them at synthesis time.
Every time a new Item is created, it just gets assigned the next numerical id in sequence. So, the ids do not carry any human-friendly information and are best ignored. Sometimes an Item may be deleted, so not every numerical id will necessarily be present.
If you wanted to see all Items currently present in an Utterance, then you could save the utterance to a file using the utt.save command, and then open that file in a text editor such as Aquamacs.
festival> (utt.save myutt 'my_file_name.utt)
The file my_file_name.utt will be saved in whatever directory you were in when you started Festival. Note that the waveform is not saved within the utterance file.
Festival can handle utf-8 and utf-16 characters, but not via the interactive command-line interface. This is a limitation of the input method. You would need to input such text from a file.
Hmm – that’s a good point, and not something I’d spotted before. It is almost certainly a typo.
Anyway, for the purposes of understanding, it’s fine to assume that Festival performs POS tagging using precisely the method described in Jurafsky & Martin.
The weights are simply the fractions of data points in each side of the split. So, we compute entropy as usual for each side (“yes” vs “no”) and then when we sum these two values, we weight each of them by the fraction of the data that went down that branch (e.g., if 1/3 of the data points had “yes” as the answer to the question under consideration, then we would weight the “yes” side’s entropy by 1/3 and the “no” side’s by 2/3, then add them together).
To query a variable in Scheme, just type it’s name at the Festival prompt, without any parentheses. If you get “unbound variable” that means the variable is not set, so the method will be the built-in default (in this case, the hand-crafted CART).
It’s tertiary stress, which is marked up in the Unisyn lexicon – see Section 3.4.3 of the Unisyn manual. Tertiary stress is essentially there not to show that a syllable might receive a pitch accent, but to block some post lexical rules, such as vowel reduction.
So, the second syllable in “upset” should never be reduced, in any context. I think Unisyn would regard “upset” as a compound word “up + set”, which is why the tertiary stress is marked up.
This section of the Festival manual gives you some clues about what happens in the Postlex module, including vowel reduction and possessive “s”.
For example, compare the Segments produced for these two sentences:
- Simon’s bike.
- Matt’s bike.
The default is the hand-crafted CART. You can inspect this classification tree thus:
festival> phrase_cart_tree
which should give something like:
((lisp_token_end_punc in ("?" "." ":")) ((BB)) ((lisp_token_end_punc in ("'" "\"" "," ";")) ((B)) ((n.name is 0) ((BB)) ((NB)))))
and if you draw that as a tree, you’ll see that the punctuation symbols
? . :
all lead to a Big Break (BB), and that the symbols
' " , ;
all lead to a Break (B) and otherwise there is No Break (NB) unless we reach the end of the input text, in which case a BB is placed even if there is no sentence-final punctuation.
Both types of error will result in a problem that we can hear in the synthetic speech. But yes, they happen at different points in the pipeline.
We can’t really say that they will be “realised similarly” though.
This topic is about finding pronunciation errors.
This topic is about finding waveform generation errors.
-
AuthorPosts