Page 36

Forum Replies Created

Viewing 15 posts - 526 through 540 (of 1,104 total)

← 1 2 3 … 35 36 37 … 72 73 74 →

Author

Posts
October 3, 2018 at 09:26 in reply to: An introduction to signal processing for speech #9386
Simon King
Professor
Yes, DCT means Discrete Cosine Transform. We will be coming on to that in the later part of Speech Processing, when we consider how to extract useful features from the FFT spectrum, to use for Automatic Speech Recognition. We’ll also bellowing at the Mel scale. Wait until we get there, then ask the question again.
October 3, 2018 at 09:25 in reply to: spectrum returned by FFT #9385
Simon King
Professor
Yes, these are both the spectrum of a voiced speech sound. The upper one appears to be on a linear vertical scale, so we only see the very largest amplitudes and everything else appears to be zero. The lower plot is on a logarithmic vertical scale and therefore we can see both very large and very small magnitudes on the same plot. The lower plot is more informative.
August 10, 2018 at 07:33 in reply to: Join & Target cost #9304
Simon King
Professor
In the exercise, the section about the target cost weight describes how to change the target cost weight.

See the notes about pre-selection and pruning – these are probably why you are not hearing any difference. You cannot disable pre-selection (because this ensures the right thing is said!) but you can disable pruning.

To confirm that different candidates are actually being used, you can examine the Unit relation of the utterance. It’s possible that the selected units are changing but you cannot hear the small difference this makes.

Use the commands described here to examine which candidates are selected:
```
festival> (set! myutt (SayText "Hello world."))
festival> (utt.relation.print myutt 'Unit)     
```
June 18, 2018 at 17:29 in reply to: Running Merlin on Eddie3 #9286
Simon King
Professor
You should get everything working on a DICE machine, using just the CPU and a small data set, before attempting to use Eddie. Have you done that?
June 18, 2018 at 17:28 in reply to: Running Merlin on Eddie3 #9285
Simon King
Professor
Looks like you have all processes set to “False” which means nothing will be done (other than loading the config files and writing some log output).
June 15, 2018 at 16:08 in reply to: Multisyn Voice Definition File #9273
Simon King
Professor
You need to post error messages to get help with them.

The voice definition needs to be somewhere in Festival’s path – e.g., put it alongside any voices that were installed when you installed Festival originally.
June 15, 2018 at 15:39 in reply to: Multisyn Voice Definition File #9271
Simon King
Professor
Look in

/Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/english/localdir_multisyn-gam

on the lab machines. This is a special kind of voice definition that looks for all the voice files in the current working directory (i.e., wherever you start Festival from).
June 15, 2018 at 13:42 in reply to: Multisyn Voice Definition File #9269
Simon King
Professor
The instructions are only intended to be used on the machines in the lab. Sounds like you might be doing this on your own machine?

The voice definition file is provided for you – just make sure you set up your workspace correctly, and source the setup file in each new shell – this sets paths and so on, so that Festival finds the voice definition file.
June 15, 2018 at 08:42 in reply to: Running Merlin on Eddie3 #9267
Simon King
Professor
The attached files didn’t upload (you need to add a .txt ending to get around the security on the forum). To debug on Eddie, do not submit jobs to the queue, but instead use an interactive session (using qlogin with appropriate flags to get on to a suitable GPU node).

But, talk to classmates first – several people are attempting to use Eddie and you should share knowledge and effort. Also, look at the Informatics MLP cluster, where people have got Merlin working – see

https://www.wiki.ed.ac.uk/display/CSTR/CSTR+computing+resources
April 9, 2018 at 14:04 in reply to: Bad Pitch Marking #9237
Simon King
Professor
There must be at least one pitchmark in every segment, to make pitch-synchronous signal processing possible. (Note: the earlier pitchmarking step inserts evenly spaced fake pitchmarks in unvoiced speech.)

A segment without any pitchmarks is mostly probably caused by misaligned labels, although very bad pitchmarking is also a potential cause.

See Section 4.2.3 of Multisyn: Open-domain unit selection for the Festival speech synthesis system, for example.

You are right that ‘bad pitchmarking’ is detected during the build_utts step, whilst transferring timestamps from the forced alignment onto the utterance structures for the database.
April 5, 2018 at 17:28 in reply to: Join cost in Multisyn #9227
Simon King
Professor
Ah – poor wording in the paper. Blame the last author. This is clearer:

“Spectral discontinuity is estimated by calculating the Euclidean distance between a pair of vectors of 12 MFCCs: one from either side of a potential join point.”

So, indeed, there is one frame either side of the join.
April 5, 2018 at 15:19 in reply to: Join cost in Multisyn #9225
Simon King
Professor
Can you point me to the exact place in the paper that this is mentioned please?
April 3, 2018 at 11:16 in reply to: Festival's pitch marking vs pitch tracking #9183
Simon King
Professor
That looks like an error, although it will only have an effect for relatively low-pitched female voices.
April 2, 2018 at 11:13 in reply to: review or change target cost function? #9180
Simon King
Professor
You are welcome to look at the Festival source code (which is now showing its age) but making these deep modifications is far beyond the scope of this exercise. You are not expected to do this.

Restrict yourself to changing things that are described in the instructions, and that can be done easily at the Festival interactive prompt. For example, you can change the relative weight between target cost and join cost.

Both of you have identified interesting things to investigate. But, you would probably need a much larger database (and a lot more time) for such experiments to make sense.
April 2, 2018 at 10:37 in reply to: Website Design idea #9179
Simon King
Professor
You’re absolutely right about the current (2017-18) layout of the exercise to build your own unit selection voice. It’s a deep (and, even worse, variable-depth) hierarchical structure based too closely on how I personally like to arrange my thoughts, and not actually that helpful for students.

In previous years, the content of my courses was also arranged in this way, but the new versions have a structure with limited nesting. Student feedback suggests the new structure is much easier to navigate.

I plan to change all the exercises to have a similar simple structure, but didn’t want to do this in the middle of an academic year.

Thank-you for the constructive feedback. Next year’s students will thank you!
Author

Posts

Viewing 15 posts - 526 through 540 (of 1,104 total)

← 1 2 3 … 35 36 37 … 72 73 74 →

Simon King

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis