Page 4

Forum Replies Created

Viewing 15 posts - 46 through 60 (of 1,104 total)

← 1 2 3 4 5 … 72 73 74 →

Author

Posts
April 1, 2024 at 08:35 in reply to: About experimental design #17680
Simon King
Professor
We went through this in a few recent classes (Module 6, Module 8, and the first class of the state-of-the art module) so revise those classes first.

In your experiments, you have learned about unit selection when it is put into practice: when you built new voices from data, or when you synthesised new sentences using those voices. One thing you should have learned is the one you stated: the sensitivity of unit selection to some of the many design choices.

The marks under “practical implications for current methods” are for discussing the implications of what you have learned for methods such as FastPitch, Tacotron 2, or the latest approaches using language modelling. For example, do some or all current methods have the same design choices as unit selection? If so, would they be more or less sensitive to each choice?

A concrete example: the unit selection voices you have built all require pitch tracking to provide a value of F0. You may have done an experiment to discover what happens when the value of F0 is poorly estimated. FastPitch also requires F0. What do you think would happen if a FastPitch model was trained with poorly-estimated F0 values?

A second concrete example: for unit selection to work correctly, we require at least one recording of every possible diphone type. For it to work well, we require multiple recordings in a variety of contexts. We call this “coverage”. What might the coverage requirements be of current methods? Do they need more or less coverage than unit selection?

A third concrete example: unit selection, in principle (although not in the voices you have built), can use signal processing to manipulate the speech – for example, to make the joins less perceptible or to impose a desired prosodic pattern. This requires a representation of the speech waveform where properties including F0 can be modified. Is that still applicable for a current method which generates a mel spectrogram? What about an audio codec such as SoundStream?

How you incorporate this into your report is up to you: designing a good structure is part of the assignment.
March 8, 2024 at 15:26 in reply to: Gibberish: Bad pitch marking or do_alignment? #17593
Simon King
Professor
Connected speech effects, including elision, will of course make forced alignment harder because there is a greater degree of mismatch between the labels and the speech. In your example above, there probably is no good alignment of those labels because there is acoustically little or no [v] in the speech.

This is a fundamental challenge in speech, and not easily solved!

But, if your alignments generally look OK, then you can say that forced alignment has been successful and move on through the subsequent steps of building the voice.
March 8, 2024 at 09:24 in reply to: Gibberish: Bad pitch marking or do_alignment? #17589
Simon King
Professor
Figuring out why forced alignment fails, and then solving that, is part of the assignment.

The most common cause is too much mismatch between the labels and the speech. That might be as simple as excessively long leading/trailing silences (solution: endpoint), or something more tricky like the voice talent’s pronunciations being too different to those in the dictionary, or letter-to-sound pronunciations which are a poor match to how the voice talent pronounced certain words.

Sometimes, the easiest solution is to use additional data (e.g., your own ARCTIC A recordings) to train the models.

Remember that this is not the same as including all of that data in the unit selection database: you could use all your data to train the alignment models, but only use specific subsets in the unit selection database for the voice you are building.
March 8, 2024 at 09:12 in reply to: Gibberish: Bad pitch marking or do_alignment? #17588
Simon King
Professor
There are two different things going on here:

1. a handful “bad pitch marking” warnings is acceptable, but not for every segment. See this post: https://speech.zone/forums/topic/bad-pitch-marking/#post-9237

2. most sp labels will have zero duration, and when you view them in Wavesurfer they will be drawn on top of a correct phone label, thus making it invisible. You need to manually delete all zero-duration sp labels before loading the file in Wavesurfer, as described in the Find and fix a labelling error step.
March 7, 2024 at 17:37 in reply to: Gibberish: Bad pitch marking or do_alignment? #17583
Simon King
Professor
Yes, that’s correct. You can use different data to train the models for alignment, than you eventually include in the unit selection database. (But be careful to report this, if it affects any of your experiments.)
March 7, 2024 at 16:55 in reply to: Gibberish: Bad pitch marking or do_alignment? #17581
Simon King
Professor
For warning 8232, search the forums.
March 7, 2024 at 16:55 in reply to: Gibberish: Bad pitch marking or do_alignment? #17580
Simon King
Professor
Looks like your forced alignment is very poor. You will find that all the words are there, but that the labels have become collapsed to the start and end of the file.

How much speech data are the models being trained on? If it is only a small amount, you could try adding the ARCTIC A utterances to your utts.data (just during forced alignment), so that the models are trained on more data and are more likely to work.
March 7, 2024 at 16:01 in reply to: Response to Speech Synthesis feedback of 2024-02 #17577
Simon King
Professor
Lab sessions are unstructured

This is only partially fair feedback. The lab sessions do generally start with an overview from the lead tutor (usually Korin) about what to focus on that week. We have noticed that many students do not pay attention to this overview (we know this from the questions they ask later in that lab session).

We will continue to provide guidance at the start of each lab session about where you should be in the assignment (e.g., with reference to the milestones) and what to focus on.
March 7, 2024 at 15:57 in reply to: Response to Speech Synthesis feedback of 2024-02 #17576
Simon King
Professor
The assignment is overwhelming

Including: too open-ended, instructions long and hard to follow, unclear expectations, and similar comments

The open-ended and slightly under-specified nature of the assignment is by design, so that students need to actively think about what they are doing, and why they are doing it, and not merely follow a sequence of instructions.

The goal of the assignment is to consolidate and test your understanding of the course material.

But I agree that this can be overwhelming and we should provide a little more structure and guidance. For this year, I have already added two class elements to address this:
1. On 2024-02-13, we went over the whole assignment and made links to the relevant parts of the course material. The key takeaway was that you should develop your understanding of that course material by doing that aspect of the assignment and then demonstrate that understanding in the write-up.
2. On 2024-03-12, we will go over the newly-revised structured marking scheme and see how to get a good mark. In other words, we will see how to demonstrate understanding.
Further coursework guidance may be added, depending on your feedback about the above.
March 7, 2024 at 15:48 in reply to: Response to Speech Synthesis feedback of 2024-02 #17575
Simon King
Professor
Keep doing this

Number of people mentioning each point is given in parentheses.

Videos and flipped classroom format (17)

In-class interactivity (16)

Whiteboard group exercises (9)

Quizzes on speech.zone (6)
February 29, 2024 at 18:11 in reply to: Gibberish: Bad pitch marking or do_alignment? #17564
Simon King
Professor
Have you inspected the alignment? Load one of your utterances and the corresponding labels into Wavesurfer or Praat and inspect them. Try that for a few different utterances.
February 29, 2024 at 18:07 in reply to: Can't make mfcc list #17563
Simon King
Professor
This script makes a list of the .mfcc files on which the forced-alignment HMMs will be trained. After running it, the file train.scp should contain the list of .mfcc filenames, corresponding to the lines in utts.data .
February 25, 2024 at 12:43 in reply to: Out-of-dictionary words #17549
Simon King
Professor
How large is your corpus?

What is the goal of finding the out-of-dictionary words?

If you wish to exclude all sentences that contain such a word, then you’ll have to find them all – you could do this using the provided Festival script (which might be slow for a very large corpus) or some other way (by writing your own code).

But if your aim is to identify all the words you might need to add to the dictionary, then you are not expected to do that for the large source corpus. You might need to rely on letter-to-sound to provide pronunciations during the text selection phase.

You should only manually write pronunciations for a modest number of words appearing in your (much smaller) recording script.
February 16, 2024 at 07:49 in reply to: More space needed #17526
Simon King
Professor
Check your quota like this:
```
$ quota --hum
```
If the figure in the space column is larger than the quota (which is generally 5000 MB = 5 GB) then you need to remove files.

Use the du command mentioned earlier in this topic to find what is taking up the most space.

(The abrt-cli status warning is probably also caused by full quota.)

If you are making many voices for the Speech Synthesis assignment, in separate copies of the ss directory, you can share files between them where applicable (e.g., the wav directory for voices that use the same database). See https://speech.zone/forums/topic/symbolic-links/ (if that looks tricky, do it with a tutor in a lab session).
February 11, 2024 at 08:17 in reply to: More space needed #17506
Simon King
Professor
You shouldn’t need that much space to do the assignment. It’s likely that you have a large amount of unnecessary files somewhere. Check disk usage like this:
```
cd
du -sh *
du -sh .?*
```
cd changes to your home directory. The first du measures the size of all regular files and directories. The second uses a glob .?* that matches all the hidden items (anything whose name starts with a period) including the directory .cache.

It should be safe to delete the contents of .cache, if that is the offending directory, or you can delete just some of the subdirectories if you prefer.
Author

Posts

Viewing 15 posts - 46 through 60 (of 1,104 total)

← 1 2 3 4 5 … 72 73 74 →

Simon King

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis