Forum Replies Created
-
AuthorPosts
-
For warning 8232, search the forums.
Looks like your forced alignment is very poor. You will find that all the words are there, but that the labels have become collapsed to the start and end of the file.
How much speech data are the models being trained on? If it is only a small amount, you could try adding the ARCTIC A utterances to your
utts.data
(just during forced alignment), so that the models are trained on more data and are more likely to work.Lab sessions are unstructured
This is only partially fair feedback. The lab sessions do generally start with an overview from the lead tutor (usually Korin) about what to focus on that week. We have noticed that many students do not pay attention to this overview (we know this from the questions they ask later in that lab session).
We will continue to provide guidance at the start of each lab session about where you should be in the assignment (e.g., with reference to the milestones) and what to focus on.
The assignment is overwhelming
Including: too open-ended, instructions long and hard to follow, unclear expectations, and similar comments
The open-ended and slightly under-specified nature of the assignment is by design, so that students need to actively think about what they are doing, and why they are doing it, and not merely follow a sequence of instructions.
The goal of the assignment is to consolidate and test your understanding of the course material.
But I agree that this can be overwhelming and we should provide a little more structure and guidance. For this year, I have already added two class elements to address this:
- On 2024-02-13, we went over the whole assignment and made links to the relevant parts of the course material. The key takeaway was that you should develop your understanding of that course material by doing that aspect of the assignment and then demonstrate that understanding in the write-up.
- On 2024-03-12, we will go over the newly-revised structured marking scheme and see how to get a good mark. In other words, we will see how to demonstrate understanding.
Further coursework guidance may be added, depending on your feedback about the above.
Keep doing this
Number of people mentioning each point is given in parentheses.
Videos and flipped classroom format (17)
In-class interactivity (16)
Whiteboard group exercises (9)
Quizzes on speech.zone (6)
Have you inspected the alignment? Load one of your utterances and the corresponding labels into Wavesurfer or Praat and inspect them. Try that for a few different utterances.
This script makes a list of the
.mfcc
files on which the forced-alignment HMMs will be trained. After running it, the filetrain.scp
should contain the list of.mfcc
filenames, corresponding to the lines inutts.data
.How large is your corpus?
What is the goal of finding the out-of-dictionary words?
If you wish to exclude all sentences that contain such a word, then you’ll have to find them all – you could do this using the provided Festival script (which might be slow for a very large corpus) or some other way (by writing your own code).
But if your aim is to identify all the words you might need to add to the dictionary, then you are not expected to do that for the large source corpus. You might need to rely on letter-to-sound to provide pronunciations during the text selection phase.
You should only manually write pronunciations for a modest number of words appearing in your (much smaller) recording script.
Check your quota like this:
$ quota --hum
If the figure in the
space
column is larger than thequota
(which is generally 5000 MB = 5 GB) then you need to remove files.Use the
du
command mentioned earlier in this topic to find what is taking up the most space.(The
abrt-cli status
warning is probably also caused by full quota.)If you are making many voices for the Speech Synthesis assignment, in separate copies of the
ss
directory, you can share files between them where applicable (e.g., thewav
directory for voices that use the same database). See https://speech.zone/forums/topic/symbolic-links/ (if that looks tricky, do it with a tutor in a lab session).You shouldn’t need that much space to do the assignment. It’s likely that you have a large amount of unnecessary files somewhere. Check disk usage like this:
cd du -sh * du -sh .?*
cd
changes to your home directory. The firstdu
measures the size of all regular files and directories. The second uses a glob.?*
that matches all the hidden items (anything whose name starts with a period) including the directory.cache
.It should be safe to delete the contents of
.cache
, if that is the offending directory, or you can delete just some of the subdirectories if you prefer.What you need to do is pass your text through Festival’s front end, to obtain the pronunciation, from which you can easily determine the diphones.
You are already doing that as part of building the voice, in the step where you do forced alignment – looks at the Creating the initial labels step of Time-align the labels
See also this topic for other ways to do this.
It’s important that, whichever method you use, you load the same phone set and pronunciation dictionary that your final voice will use. (e.g., don’t use CMUdict).
Some of the posts in this topic may also be helpful.
The purpose of building a voice from pre-existing ARCTIC recordings (available from festvox.org) was to learn the tools.
Your experiments will use voices built from recordings of yourself, both of the ARCTIC script and your own script, separately and in combination. That is why it’s important to make all your recordings in the same studio.
You are likely to need to build a number of different voices – e.g., from different subsets of the recordings – to answer whatever questions you devise for you experiments.
That page is from 2019-20 and is now outdated. The fileserver
fs1.ppls.ed.ac.uk
may no longer be available – see instead this page and scroll down to Note [24/10/23] which says to replacefs1.ppls.ed.ac.uk
with one of the lab machines instead.Does that work?
January 18, 2024 at 18:50 in reply to: Unit Selection exercise – No module named ‘EST_Utterance’ #17436As noted in the first class, it is non-trivial to build the version of Festival required for this exercise: it requires compilation of Python wrappers for the underlying C++ code using SWIG. Getting this compilation to work correctly involves performing Magic.
We strongly recommend using the lab machines, where everything Just Works.
Anyone really determined to build their own Python wrappers should bring their machine to a lab session and smile very nicely at Korin.
-
AuthorPosts