Page 56

Forum Replies Created

Viewing 15 posts - 826 through 840 (of 1,084 total)

← 1 2 3 … 55 56 57 … 71 72 73 →

Author

Posts
April 6, 2016 at 12:16 in reply to: Lexicon issue #2975
Simon
Professor
This error is relevant because it leads to incorrect labels on the database, which unit selection is not robust to. So, it may be worth mentioning. Perhaps you could suggest some solutions (one would be better WSD of course) and think about how a statistical parametric approach would be affected by the same kind of front-end error.
April 6, 2016 at 09:47 in reply to: Bad Pitch Marking #2972
Simon
Professor
Correct – join locations are at pitch marks, and the signal (*) is overlap-added at the joins with an overlap region of one pitch period.

(*) the signal could in principal be the speech waveform, but in Festival it is the residual waveform because the released version of Festival uses RELP signal processing and not TD-PSOLA.
April 4, 2016 at 16:36 in reply to: Lexicon issue #2966
Simon
Professor
In the Unilex dictionary (RPX accent) there are the following entries for “lower”:
```
("lower" (nn glare) (((l ow @) 1)))
("lower" (vb glare) (((l ow @) 1)))
("lower" (vb make-low) (((l ou @) 1)))
("lower" (vbp glare) (((l ow @) 1)))
("lower" (vbp make-low) (((l ou @) 1)))
```
The “glare” part of the entry is the word sense, which could be used by Word Sense Disambiguation. Festival’s WSD module very probably doesn’t know about the rather obscure “glare” sense of the word “lower” (as in to glare at someone).
April 4, 2016 at 14:38 in reply to: Bad Pitch Marking #2963
Simon
Professor
The pitch marks determine the possible join locations (after forced-alignment, they are rounded up or down to the nearest pitch mark), so unit choices can change.

To turn off the target cost, set its weight to zero. The target costs will all be 0 when you inspect the Unit relation.
April 4, 2016 at 11:59 in reply to: Why do we need a vocoder? #2962
Simon
Professor
(Lecture 5 was about pitch tracking and pitchmarking. Do you mean Lecture 6?)

Unit selection systems do necessarily not require a vocoder, but could optionally use one so that joins could be smoothed. Alternatively, we might only do time-domain processing (i.e., direct waveform concatenation).

The RELP coding used in Festival could be thought of as a type of vocoding, although the need for the residual (which is itself a waveform) would make this a rather inconvenient vocoder for other purposes, such as statistical parametric speech synthesis.
April 4, 2016 at 09:39 in reply to: Referencing without a direct quote #2958
Simon
Professor
Use direct quotes very sparingly, and wherever possible state things in your own words. One situation where a direct quote is needed is where reporting the precise wording used by that author is important (e.g., you are going to criticise it).

If you are just attributing an idea, or supporting a claim then there is generally no need for a direct quote. Describe the idea in your own words, and place the citation at a place that makes the connection between the idea and the citation obvious.

In your example, you are making a claim that something is well-known, and so a citation is essential there. No direct quote is needed in this case.

Many citation styles are possible. The APA style is good default choice.
April 4, 2016 at 09:29 in reply to: Reference style #2955
Simon
Professor
Any reasonable style (i.e., anything in current use in a major journal in our field) is acceptable. APA is a particularly good choice of referencing style.
April 4, 2016 at 09:28 in reply to: Bad Pitch Marking #2954
Simon
Professor
The first thing we might observe is that objective measures (in your case, you are proposing “number of pitchmarking errors”) do not always correlate with perceptual results. If they did, life would be much easier!

There are clearly some complex interactions between the various factors that you mention. That’s typical of unit selection, and reflects the difficulty in automatically tuning this type of system.

I can’t suggest a simple reason for the results you are observing, but some other things to look at might be:

– total number of joins in each case

– what happens when you turn off the target cost, in each case
April 3, 2016 at 19:16 in reply to: Required files at run time #2947
Simon
Professor
You’ve got all the essential points. The coefficients needed for RELP synthesis are stored in two parallel sets of files: the LPC filter coefficients and the residual signals. The filter coefficients are a sequence of vectors (one vector = one set of filter coefficients at a certain point in time) and these are pitch synchronous, and so implicitly represent the pitch marks (your point 2. is correct). The answer to point 3 is that the information is already there in the filter co-efficients, and there is no need to duplicate that in the residuals. Filter co-efficients and residuals “belong together” and for each utterance there is a pair of files.
April 3, 2016 at 17:14 in reply to: Writing to a specified length #2939
Simon
Professor
Here’s a paper on which I am a co-author, to give an example of reducing both the word count and the amount of space (from nearly 5 pages down to 4 pages), as well as making editorial improvements to the text.

Attachments:
You must be logged in to view attached files.
April 3, 2016 at 17:02 in reply to: Writing to a specified length #2937
Simon
Professor
Here’s an example of reducing the word count -the first page shows the original (thanks to the anonymous student who contributed this example) and the second page is my edit. The constraint here was word count, and not space.

Attachments:
You must be logged in to view attached files.
April 3, 2016 at 12:47 in reply to: RELP in Festival/Multisyn #2934
Simon
Professor
The filter co-efficients are not cross-faded. Remember that they are specified frame-by-frame (not sample-by-sample, like the residual). We just concatenate the sequences of frames of filter co-efficients for all the candidates – this gives us a complete sequence of filter co-efficients for the full utterance.
April 3, 2016 at 12:45 in reply to: Required files at run time #2933
Simon
Professor
You need to do some more detective work to find out how the pitchmarks are found. For example, try omitting the pitchmarking step and see what happens as you build the voice.
April 3, 2016 at 10:52 in reply to: RELP in Festival/Multisyn #2931
Simon
Professor
What is the pipeline for concatenation and RELP waveform generation?

A complete residual signal (which is just a waveform) for the whole utterance is constructed by concatenating the residuals of the selected candidates. Overlap-and-add (i.e., crossfade) is performed at the joins, over a duration of one pitch period. A corresponding sequence of LPC filter coefficients is also constructed.

The function lpc_filter_fast in .../speech_tools/sigpr/filter.cc then does the waveform generation. The inputs are the utterance-length residual waveform and the sequence of LPC filter co-efficients. I’ve just realised that I wrote that code nearly 20 years ago…
```
/*************************************************************************/
/*                       Author :  Simon King                            */
/*                       Date   :  October 1996                          */
/*-----------------------------------------------------------------------*/
/*                         Filter functions                              */
/*                                                                       */
/*=======================================================================*/
```
April 3, 2016 at 09:41 in reply to: RELP in Festival/Multisyn #2929
Simon
Professor
Why does Festival use Residual Excited LP (RELP)?

The released version of Festival uses RELP for two reasons. The first reason is practical – TDPSOLA is patented:

Method and apparatus for speech synthesis by wave form overlapping and adding EP0363233 (A1)

The patent was filed by the French state via their research centre CNET which then became France Telecom, or Orange, as they are known today.

The second reason is that RELP allows pitch/time/spectral envelope modification, as you mention. In the older diphone engine, RELP is indeed used for time- and pitch-modification. In Multisyn, no modification or join smoothing is performed, although in principle it would be possible to add this to the implementation.
Author

Posts

Viewing 15 posts - 826 through 840 (of 1,084 total)

← 1 2 3 … 55 56 57 … 71 72 73 →

Simon

Forum Replies Created

Attachments:

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis