Page 3

Forum Replies Created

Viewing 15 posts - 31 through 45 (of 1,087 total)

← 1 2 3 4 … 71 72 73 →

Author

Posts
March 7, 2024 at 16:55 in reply to: Gibberish: Bad pitch marking or do_alignment? #17580
Simon
Professor
Looks like your forced alignment is very poor. You will find that all the words are there, but that the labels have become collapsed to the start and end of the file.

How much speech data are the models being trained on? If it is only a small amount, you could try adding the ARCTIC A utterances to your utts.data (just during forced alignment), so that the models are trained on more data and are more likely to work.
March 7, 2024 at 16:01 in reply to: Response to Speech Synthesis feedback of 2024-02 #17577
Simon
Professor
Lab sessions are unstructured

This is only partially fair feedback. The lab sessions do generally start with an overview from the lead tutor (usually Korin) about what to focus on that week. We have noticed that many students do not pay attention to this overview (we know this from the questions they ask later in that lab session).

We will continue to provide guidance at the start of each lab session about where you should be in the assignment (e.g., with reference to the milestones) and what to focus on.
March 7, 2024 at 15:57 in reply to: Response to Speech Synthesis feedback of 2024-02 #17576
Simon
Professor
The assignment is overwhelming

Including: too open-ended, instructions long and hard to follow, unclear expectations, and similar comments

The open-ended and slightly under-specified nature of the assignment is by design, so that students need to actively think about what they are doing, and why they are doing it, and not merely follow a sequence of instructions.

The goal of the assignment is to consolidate and test your understanding of the course material.

But I agree that this can be overwhelming and we should provide a little more structure and guidance. For this year, I have already added two class elements to address this:
1. On 2024-02-13, we went over the whole assignment and made links to the relevant parts of the course material. The key takeaway was that you should develop your understanding of that course material by doing that aspect of the assignment and then demonstrate that understanding in the write-up.
2. On 2024-03-12, we will go over the newly-revised structured marking scheme and see how to get a good mark. In other words, we will see how to demonstrate understanding.
Further coursework guidance may be added, depending on your feedback about the above.
March 7, 2024 at 15:48 in reply to: Response to Speech Synthesis feedback of 2024-02 #17575
Simon
Professor
Keep doing this

Number of people mentioning each point is given in parentheses.

Videos and flipped classroom format (17)

In-class interactivity (16)

Whiteboard group exercises (9)

Quizzes on speech.zone (6)
February 29, 2024 at 18:11 in reply to: Gibberish: Bad pitch marking or do_alignment? #17564
Simon
Professor
Have you inspected the alignment? Load one of your utterances and the corresponding labels into Wavesurfer or Praat and inspect them. Try that for a few different utterances.
February 29, 2024 at 18:07 in reply to: Can't make mfcc list #17563
Simon
Professor
This script makes a list of the .mfcc files on which the forced-alignment HMMs will be trained. After running it, the file train.scp should contain the list of .mfcc filenames, corresponding to the lines in utts.data .
February 25, 2024 at 12:43 in reply to: Out-of-dictionary words #17549
Simon
Professor
How large is your corpus?

What is the goal of finding the out-of-dictionary words?

If you wish to exclude all sentences that contain such a word, then you’ll have to find them all – you could do this using the provided Festival script (which might be slow for a very large corpus) or some other way (by writing your own code).

But if your aim is to identify all the words you might need to add to the dictionary, then you are not expected to do that for the large source corpus. You might need to rely on letter-to-sound to provide pronunciations during the text selection phase.

You should only manually write pronunciations for a modest number of words appearing in your (much smaller) recording script.
February 16, 2024 at 07:49 in reply to: More space needed #17526
Simon
Professor
Check your quota like this:
```
$ quota --hum
```
If the figure in the space column is larger than the quota (which is generally 5000 MB = 5 GB) then you need to remove files.

Use the du command mentioned earlier in this topic to find what is taking up the most space.

(The abrt-cli status warning is probably also caused by full quota.)

If you are making many voices for the Speech Synthesis assignment, in separate copies of the ss directory, you can share files between them where applicable (e.g., the wav directory for voices that use the same database). See https://speech.zone/forums/topic/symbolic-links/ (if that looks tricky, do it with a tutor in a lab session).
February 11, 2024 at 14:09 in reply to: Mounting drive on local mac #17508
Simon
Professor
Try this:

https://www.ed.ac.uk/information-services/computing/desktop-personal/connect-uni-file-storage/macos
February 11, 2024 at 08:17 in reply to: More space needed #17506
Simon
Professor
You shouldn’t need that much space to do the assignment. It’s likely that you have a large amount of unnecessary files somewhere. Check disk usage like this:
```
cd
du -sh *
du -sh .?*
```
cd changes to your home directory. The first du measures the size of all regular files and directories. The second uses a glob .?* that matches all the hidden items (anything whose name starts with a period) including the directory .cache.

It should be safe to delete the contents of .cache, if that is the offending directory, or you can delete just some of the subdirectories if you prefer.
February 8, 2024 at 08:44 in reply to: Diphone information from Festival #17500
Simon
Professor
What you need to do is pass your text through Festival’s front end, to obtain the pronunciation, from which you can easily determine the diphones.

You are already doing that as part of building the voice, in the step where you do forced alignment – looks at the Creating the initial labels step of Time-align the labels

See also this topic for other ways to do this.

It’s important that, whichever method you use, you load the same phone set and pronunciation dictionary that your final voice will use. (e.g., don’t use CMUdict).

Some of the posts in this topic may also be helpful.
February 2, 2024 at 17:04 in reply to: Arctic voices #17479
Simon
Professor
The purpose of building a voice from pre-existing ARCTIC recordings (available from festvox.org) was to learn the tools.

Your experiments will use voices built from recordings of yourself, both of the ARCTIC script and your own script, separately and in combination. That is why it’s important to make all your recordings in the same studio.

You are likely to need to build a number of different voices – e.g., from different subsets of the recordings – to answer whatever questions you devise for you experiments.
January 29, 2024 at 14:02 in reply to: Mounting drive on local mac #17468
Simon
Professor
That page is from 2019-20 and is now outdated. The fileserver fs1.ppls.ed.ac.uk may no longer be available – see instead this page and scroll down to Note [24/10/23] which says to replace fs1.ppls.ed.ac.uk with one of the lab machines instead.

Does that work?
January 18, 2024 at 18:50 in reply to: Unit Selection exercise – No module named ‘EST_Utterance’ #17436
Simon
Professor
As noted in the first class, it is non-trivial to build the version of Festival required for this exercise: it requires compilation of Python wrappers for the underlying C++ code using SWIG. Getting this compilation to work correctly involves performing Magic.

We strongly recommend using the lab machines, where everything Just Works.

Anyone really determined to build their own Python wrappers should bring their machine to a lab session and smile very nicely at Korin.
December 6, 2023 at 12:27 in reply to: Jurafsky & Martin – Chapter 9 #17335
Simon
Professor
The number of observations in the observation sequence is fixed, and they all have to be generated by the model (i.e., the compiled-together language model and acoustic model).

There are many possible paths through the model that could generate this observation sequence. Some paths will pass through mostly short words, each of which generates a short sequence of observations (because short words tend to have short durations when spoken). Other paths pass through long words, each of which will typically generate a longer sequence of observations.

So, to generate the fixed-length observation sequence, the model might take a path through many short words, or through a few long words, or something in-between.

Paths through many short words are likely to contain insertion errors. Paths through a few long words are likely to contain deletion errors. The path with the lowest WER is likely to be a compromise between the two: we need some way to control that, which is what the WIP provides.

Again, J&M’s explanation of the LMSF is not the best, so don’t get lost in their explanations of the interaction between LMSF and WIP.

In summary:
- The LMSF is required because the language model computes probability mass, whilst the acoustic model computes probability density.
- The WIP enables us to trade off insertion errors against deletion errors.
Author

Posts

Viewing 15 posts - 31 through 45 (of 1,087 total)

← 1 2 3 4 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis