Page 35

Forum Replies Created

Viewing 15 posts - 511 through 525 (of 1,084 total)

← 1 2 3 … 34 35 36 … 71 72 73 →

Author

Posts
June 15, 2018 at 16:08 in reply to: Multisyn Voice Definition File #9273
Simon
Professor
You need to post error messages to get help with them.

The voice definition needs to be somewhere in Festival’s path – e.g., put it alongside any voices that were installed when you installed Festival originally.
June 15, 2018 at 15:39 in reply to: Multisyn Voice Definition File #9271
Simon
Professor
Look in

/Volumes/Network/courses/ss/festival/lib.incomplete/voices-multisyn/english/localdir_multisyn-gam

on the lab machines. This is a special kind of voice definition that looks for all the voice files in the current working directory (i.e., wherever you start Festival from).
June 15, 2018 at 13:42 in reply to: Multisyn Voice Definition File #9269
Simon
Professor
The instructions are only intended to be used on the machines in the lab. Sounds like you might be doing this on your own machine?

The voice definition file is provided for you – just make sure you set up your workspace correctly, and source the setup file in each new shell – this sets paths and so on, so that Festival finds the voice definition file.
June 15, 2018 at 08:42 in reply to: Running Merlin on Eddie3 #9267
Simon
Professor
The attached files didn’t upload (you need to add a .txt ending to get around the security on the forum). To debug on Eddie, do not submit jobs to the queue, but instead use an interactive session (using qlogin with appropriate flags to get on to a suitable GPU node).

But, talk to classmates first – several people are attempting to use Eddie and you should share knowledge and effort. Also, look at the Informatics MLP cluster, where people have got Merlin working – see

https://www.wiki.ed.ac.uk/display/CSTR/CSTR+computing+resources
April 9, 2018 at 14:04 in reply to: Bad Pitch Marking #9237
Simon
Professor
There must be at least one pitchmark in every segment, to make pitch-synchronous signal processing possible. (Note: the earlier pitchmarking step inserts evenly spaced fake pitchmarks in unvoiced speech.)

A segment without any pitchmarks is mostly probably caused by misaligned labels, although very bad pitchmarking is also a potential cause.

See Section 4.2.3 of Multisyn: Open-domain unit selection for the Festival speech synthesis system, for example.

You are right that ‘bad pitchmarking’ is detected during the build_utts step, whilst transferring timestamps from the forced alignment onto the utterance structures for the database.
April 5, 2018 at 17:28 in reply to: Join cost in Multisyn #9227
Simon
Professor
Ah – poor wording in the paper. Blame the last author. This is clearer:

“Spectral discontinuity is estimated by calculating the Euclidean distance between a pair of vectors of 12 MFCCs: one from either side of a potential join point.”

So, indeed, there is one frame either side of the join.
April 5, 2018 at 15:19 in reply to: Join cost in Multisyn #9225
Simon
Professor
Can you point me to the exact place in the paper that this is mentioned please?
April 3, 2018 at 11:16 in reply to: Festival's pitch marking vs pitch tracking #9183
Simon
Professor
That looks like an error, although it will only have an effect for relatively low-pitched female voices.
April 2, 2018 at 11:13 in reply to: review or change target cost function? #9180
Simon
Professor
You are welcome to look at the Festival source code (which is now showing its age) but making these deep modifications is far beyond the scope of this exercise. You are not expected to do this.

Restrict yourself to changing things that are described in the instructions, and that can be done easily at the Festival interactive prompt. For example, you can change the relative weight between target cost and join cost.

Both of you have identified interesting things to investigate. But, you would probably need a much larger database (and a lot more time) for such experiments to make sense.
April 2, 2018 at 10:37 in reply to: Website Design idea #9179
Simon
Professor
You’re absolutely right about the current (2017-18) layout of the exercise to build your own unit selection voice. It’s a deep (and, even worse, variable-depth) hierarchical structure based too closely on how I personally like to arrange my thoughts, and not actually that helpful for students.

In previous years, the content of my courses was also arranged in this way, but the new versions have a structure with limited nesting. Student feedback suggests the new structure is much easier to navigate.

I plan to change all the exercises to have a similar simple structure, but didn’t want to do this in the middle of an academic year.

Thank-you for the constructive feedback. Next year’s students will thank you!
March 27, 2018 at 12:56 in reply to: Unknown Label error #9172
Simon
Professor
There is no point in simply copying pronunciations predicted using the LTS model into a dictionary. The dictionary is for storing exceptions to the LTS model.

So, if the LTS model gets the pronunciation correct, no need to add it to the dictionary.

But, of the LTS model gets the pronunciation wrong, you need to add a manually-created correct entry to the dictionary..

The phone set used is the one for whichever voice you currently have loaded (i.e., you need to make sure this is the one you want).

Do not use different dictionaries for different stages in the process. This makes no sense at all: the symbol sets used by different dictionaries are not interchangeable (even if it looks like some of the same symbols are used – the names of the symbols are arbitrary).
March 22, 2018 at 19:26 in reply to: Unknown Label error #9158
Simon
Professor
You probably used the wrong voice, or wrong dictionary, to create your labels. For example, you ran some steps with one dictionary, and other steps with a different dictionary.
March 8, 2018 at 12:42 in reply to: Bulk processing of text in Festival #9140
Simon
Professor
This will probably be because of Apple’s over-strict security settings. The files are not damaged. Try downloading them in a browser other than Safari, or on a different computer (not in the lab).
March 3, 2018 at 10:57 in reply to: Amount of source text to start with, for my text selection algorithm #9131
Simon
Professor
If you can only find 550 in-domain sentences, then you’re going to have to record pretty much all of them. So, no point doing text selection, but you should still measure the coverage.

You then propose to experiment with text selection using a bigger set of source data. That’s a good idea – just what I recommend in the previous answer above.You can measure coverage, demonstrate that your algorithm works, and so on – all good material for your report. But you perhaps don’t need to record that dataset, unless you really enjoy recording.

In general, I’d expect you to record Arctic A plus additional data of your own design amounting to about the same size as Arctic A. I think you’re proposing a third set of the same size again. If you’re efficient at recording (i.e., you get almost all sentences right in one ‘take’), and the time taken to get this data ready for voice building (e.g., sanity checking, choosing the best ‘take’) is not too much, then you could do it. But it’s definitely not essential.
March 3, 2018 at 10:09 in reply to: Amount of source text to start with, for my text selection algorithm #9129
Simon
Professor
Your methodology is good: find as much domain-specific material as possible, and then use an algorithm to select the subset with best coverage.

You suggest that, because you are starting with a small amount of source text, you should select a smaller subset to record.

Actually, I would recommend selecting a subset that is the same size as Artic A (you decide how to measure ‘size’), because this would enable interesting comparisons to be made.

Starting with only 1100 sentences will limit how much your algorithm will be able to improve coverage, compared to random selection of a subset of the same size. But, it’s still a worthwhile exercise, because you’re doing all the important steps. So, go ahead.

In your report, you can acknowledge the limitations, and you could also show how much your algorithm was able to improve coverage. So, you have lots of ways to demonstrate your understanding and to get a good mark.

If you want to demonstrate that your text selection algorithm would work better given more source text, then you could run it on a much larger set (e.g., 1 million sentences) and measure coverage vs random selection. Don’t bother recording the selected sentences though – the goal is just to show that your algorithm works.
Author

Posts

Viewing 15 posts - 511 through 525 (of 1,084 total)

← 1 2 3 … 34 35 36 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis