Page 41

Forum Replies Created

Viewing 15 posts - 601 through 615 (of 1,073 total)

← 1 2 3 … 40 41 42 … 70 71 72 →

Author

Posts
April 7, 2017 at 10:32 in reply to: decision tree in ASF #7058
Simon
Professor
Your proposal is to use perceptual data (i.e., from listening tests with human subjects) to define a target cost function. It’s a good idea, and has been tried, but it’s difficult to get enough perceptual data to automatically learn such a function.

In the following paper, we describe a simple target cost function (in the form of a classifier) that is learned from perceptual data. It worked, but did not beat Festival’s standard IFF target cost function. Note that our novel target cost function is still using only linguistic features as input, and doesn’t use acoustic properties of the candidates.

Volker Strom and Simon King. A classifier-based target cost for unit selection speech synthesis trained on perceptual data. In Proc. Interspeech, Makuhari, Japan, 2010.
April 7, 2017 at 10:11 in reply to: Festival's pitch marking vs pitch tracking #7057
Simon
Professor
http://www.outsideecho.com/llsti/pubs/Pitch_marking.pdf is wrong, in this regard.

make_pm_wave is a script that runs the program pitchmark

The documentation is here. Note that this method does actually work on speech waveforms, although at the time the manual was written we were still using Laryngograph signals, which later proved to be unnecessary.

pitchmark works as described in the video
April 3, 2017 at 12:27 in reply to: Citing pages of a book #7053
Simon
Professor
The answer is in the complete IEEE style manual (which I do not expect you to read in its entirety) that gives examples of how to cite a specific part of a work, including:

[3, pp. 5-10]

[3, eq. (2)]

[3, Fig. 1]

[3, Appendix I]

[3, Sec. 4.5]

[3, Ch. 2, pp. 5-10]

[3, Algorithm 5]
April 2, 2017 at 20:47 in reply to: Appendix #7050
Simon
Professor
You probably do not need an appendix, unless you think that it’s a useful place for some content (e.g., tables of results) that would otherwise interrupt the flow of the main body of the paper. For example, you might have compact summaries as tables or graphs in the main body, with the underlying data in an appendix. This is optional, and most scientific papers don’t go that far.

It’s usually better, where possible, to provide results within the main body, so that the reader has them handy without turning the page. In fact, you should try hard to make tables and figures appear on the same page as that in which they are first referred to in the text.

If you do include an appendix, it will contribute to the word count.
March 31, 2017 at 07:44 in reply to: Building a voice with OOV words #7048
Simon
Professor
It sounds like you are relying on letter-to-sound to provide pronunciations for this word – is that the case? You should manually add the pronunciation for “Skulason” to your lexicon. Avoid using the glottal stop in that pronunciation.

The pronunciation /s k ? l ei z n!/ looks pretty weird to me and a native speaker of English would have difficulty producing a glottal stop in the context [s k _ l].
March 28, 2017 at 17:32 in reply to: size of the database #7045
Simon
Professor
That sounds reasonable, yes.

But, what if (for a few individual sentences), the ‘degraded’ voice is actually more natural than the full voice. It’s perfectly possible! What response would you expect from listeners in those cases? Will you need to give them instructions about how to respond in such cases?
March 28, 2017 at 08:57 in reply to: size of the database #7043
Simon
Professor
“How [does] the decrease of the size of the database degrade the system?”

is not a hypothesis – it’s a research question. That’s OK, but the word “how” is ambiguous: are you talking about “how much” or “in what way” or even “why”?

Degradation Category Rating would be a valid paradigm, but you need to have a non-degraded reference sample against which each test stimulus can be compared by the listener. That makes perfect sense in speech coding, where the codec will always degrade each and every sample.

But for synthetic speech, what will your reference be – natural speech?
March 26, 2017 at 09:25 in reply to: Q in make_initial_phone_labs #7041
Simon
Professor
Q is the glottal stop. In some phone sets, the symbol ? is used (since that’s closest to the IPA symbol) – but that character would cause problems in HTK because it is used as part of the pseudo-regular expression language for state clustering.

Treat it just like any other phone as far as coverage is concerned.

Remember that the phone set depends on the dictionary you are using, and there is not necessarily any correspondence between symbols in one phone set and those in another: they are somewhat arbitrary.
March 24, 2017 at 09:28 in reply to: Small differences between voices #7031
Simon
Professor
the voice built on carefully selected, domain-specific data performs better overall than… a voice that combines both datasets

This is quite possible. More data is not always better! You should definitely try to understand why this is.
March 23, 2017 at 16:15 in reply to: Small differences between voices #7026
Simon
Professor
It depends what your hypothesis is. Tell me that, and we can decide what the minimum experiment required to test that hypothesis is.

Always think in terms of hypothesis testing, and not testing of voices.
March 3, 2017 at 17:18 in reply to: Response to Speech Synthesis feedback of 2017-02-28 #6860
Simon
Professor
Things that multiple people thought were good about the course
- Lively lectures with varied activities
- Interesting course content
- Resources are easily accessible
- The videos
- Readings, both content and quantity
- The assignment is interesting / fun / helpful for learning
March 3, 2017 at 17:15 in reply to: Response to Speech Synthesis feedback of 2017-02-28 #6859
Simon
Professor
The structure of the web pages for the assignment is not as good as for course content

You are correct. I may improve it in future, but do not want to change it mid-way through the course.

However, there is a deliberate design to the coursework instructions: they are intended to make you work a bit, to help you learn.

For example, the instructions do not make the dependencies between the steps explicit. You are expected to work this out as part of the learning process: e.g., you have to work out that changing the alignment will change the join positions, and therefore you would need to re-calculate the join cost coefficients.
March 3, 2017 at 17:11 in reply to: Response to Speech Synthesis feedback of 2017-02-28 #6858
Simon
Professor
What software can we use for running our listening tests?

There is a forum topic for exactly that.
March 3, 2017 at 17:10 in reply to: Response to Speech Synthesis feedback of 2017-02-28 #6857
Simon
Professor
In-class Question-and-Answer is better than using the forums

Yes, you are right. So, please ask more questions in class!
March 3, 2017 at 17:09 in reply to: Response to Speech Synthesis feedback of 2017-02-28 #6856
Simon
Professor
Provide more help on scripting in the labs

I do provide quite a bit of help in-person – just put your hand up more often and ask me for this in any lab session. There is also a forum dedicated to this topic.
Author

Posts

Viewing 15 posts - 601 through 615 (of 1,073 total)

← 1 2 3 … 40 41 42 … 70 71 72 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis