Page 24

Forum Replies Created

Viewing 15 posts - 346 through 360 (of 1,087 total)

← 1 2 3 … 23 24 25 … 71 72 73 →

Author

Posts
April 7, 2020 at 12:39 in reply to: Timing Pruning Experiments #11118
Simon
Professor
The run time might be dominated by the time taken to load the voice. There are several ways to control for that:
1. Use a machine with no other users logged in
2. Put a copy of the voice on the local disk of the machine you are using (e.g., copy your ss folder to /tmp and change to that folder before starting Festival) so that the loading time is fast and consistent
3. Likewise, if saving the waveform output, make sure to write to local disk, or better to the system “black hole” file /dev/null, or even better not to write any output at all
4. Synthesise a large enough set of sentences to give a run time of a minute or more, thus making the few seconds of voice loading time irrelevant
Remember also to report the “user” time (which is the time used by the process) and not “real” (which is wall-clock time).

There are interactions between “Observation pruning” and “Beam pruning” and you will want to disable one when experimenting with the other. With small databases where there is only one candidate for some diphones, pruning will have no effect in those parts of the search: that one candidate will always have to be used, no matter what. So, do these experiments with the largest possible database (e.g., ARCTIC slt).
April 7, 2020 at 12:00 in reply to: What to run for varying pitch marking parameters #11116
Simon
Professor
You need to deduce which subsequent steps depend on pitch marking – try drawing a flowchart of the steps in the voice building process showing the flow of information between them.

For example, pitch marks determine potential join locations, and so the computation of join cost coefficients will be affected. Therefore any steps related to join cost will need to be re-run.

(If in doubt, run all subsequent steps anyway, to be on the safe side.)
April 7, 2020 at 08:49 in reply to: Local tools for running evaluations #11114
Simon
Professor
https://www.ed.ac.uk/information-services/learning-technology/survey-tools/qualtrics-for-cahss-members
April 7, 2020 at 08:19 in reply to: Levels of stress #11112
Simon
Professor
The target cost function collapses all levels of stress (1,2,3) into a single level (1 = “stressed”).
April 6, 2020 at 14:13 in reply to: do_alignment script #11108
Simon
Professor
Please could you post details of the exact problem and the solution, for future reference.
April 5, 2020 at 20:59 in reply to: do_alignment script #11105
Simon
Professor
You can easily check that endpointing has worked by inspecting the endpointed wav files – there should be a small (but non-zero) amount of silence at the start and end of every file. I’m not sure that’s the cause of your error though, but it’s something you should check anyway.

Modifying train.scp reduces the amount of training data for the models, but alignment will still be performed on all the data. You only want to be doing this for an experiment to measure the effect of less well-trained models (and the resulting accuracy of alignment) independently of the amount of data in the unit selection database.

Does the removal of any utterance lead to the error, or only specific ones? If the latter, could it be an utterance containing the only remaining example of a particular phoneme within the utterances listed in train.scp? That would lead to an untrained model for that phoneme, and this model will cause problems during alignment.

In general, you need at least one training example per phoneme, and ideally three. Check for warnings from HERest.
April 5, 2020 at 16:23 in reply to: Materials not covered during strikes #11103
Simon
Professor
To close this topic: there is no exam this year. But you should still think about how to integrate the content of modules 6-9 and the state-of-the-art in your coursework report.
April 5, 2020 at 11:49 in reply to: Number of experiments #11100
Simon
Professor
Yes, that would be fine. For a higher mark, you could complement that with other forms of evaluation for other hypotheses.
April 5, 2020 at 10:38 in reply to: Number of experiments #11097
Simon
Professor
The intention was to respond to this type of question in lab sessions, for each individual student. Since that’s not possible now, I’ll provide a generic answer here.

First, remember that a formal listening test is not the only option for every experiment. There are at least two other options for testing a hypothesis: expert listening by the author, or an objective measure.

Second, remember that not every hypothesis is worth testing formally. For example, if you – the expert listener – cannot discern any difference between two conditions, then there is little point asking whether other listeners can hear one.

Once you have decided that a formal listening test is what you need, then – as you correctly point out – you will have to be selective about which hypotheses are worth testing in this relatively expensive way.

I suggest testing a handful of hypotheses in total, of which maybe just a couple would have a formal listening test.
April 5, 2020 at 10:26 in reply to: Join & Target cost #11096
Simon
Professor
The target and join cost values reported by Festival have already been multiplied by their respective weights.

A low target cost weight will bias the search towards finding good joins (those with lower cost), at the expense of selecting candidates with a poorer match to their target, i.e., candidates with high target cost, noting that the reported target cost has been multiplied by a low target cost weight.

The consequence is that it is only valid to compare absolute values of join and target costs for a fixed setting of the target cost weight (e.g., comparing across different input sentences, or a fixed sentence synthesised with different unit databases). Changing the weight changes the absolute values.

An added complication in inspecting the total join cost across an utterance, as you vary the target cost weight, is that the proportion of zero-cost joins will vary – so you will get sudden ‘jumps’ in the values.

In summary – you are doing the right thing in inspecting values closely for individual sentences, but the absolute values of the costs are not very helpful. Try inspecting the ratio between them instead. If you’re looking for something objective to measure, then the number of zero-cost joins is a good option.
April 5, 2020 at 09:17 in reply to: Latin Square Approach for testing naturalness #11095
Simon
Professor
Yes, using a “between subjects” design for naturalness would be fine – it’s what the Blizzard Challenge does. It is not essential though, and a “within subjects” design is acceptable.
April 5, 2020 at 09:13 in reply to: Cannot execute binary file #11094
Simon
Professor
The “festival_mac” in the PATH is the clue. It’s a curious bug. See this topic.
April 2, 2020 at 14:15 in reply to: Trajectory Tiling – Qian et al #11087
Simon
Professor
Yes, that diagram has many steps! p287 says

“After we generate the Mandarin training sentences for the monolingual English speaker, his HMM based TTS in Mandarin can be trained via the standard HMM training procedure.”

so what they are doing is using trajectory tiling (with the waveform being created using concatenation) to construct a training set in the target language, for a speaker who doesn’t speak that language.

That data is then used to train a conventional HMM-based system that drives a vocoder.

All the synthesisers compared in Fig 12 are conventional HMM-plus-vocoder systems. Trajectory tiling is used to create the training data for TSMT.
April 2, 2020 at 09:59 in reply to: Talkin: A robust algorithm for pitch tracking – autocorrelation equation #11086
Simon
Professor
j and m are both indexing the samples in the entire waveform under analysis. Remember that we are doing short-term analysis which involved analysing short frames taken from that waveform

m is the first sample in the current analysis frame (the i’th frame)

j is counting through the samples in the current analysis frame

So j=m is the lower limit of the summation (the first sample in the current frame) but j then increments up to j=m+n-k-1 (the last sample in the current frame)
April 2, 2020 at 09:41 in reply to: Audio in Qualtrics? #11085
Simon
Professor
Qualtrics might convert them to mp3 silently (certainly some platforms do) – check in a browser by taking your completed test as a subject.

The main problems with using wav files on the web are
1. They are larger than mp3 – not a problem here: we care about quality not size
2. Some browsers, notably Safari, will not play wav files
Author

Posts

Viewing 15 posts - 346 through 360 (of 1,087 total)

← 1 2 3 … 23 24 25 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis