Page 12

Forum Replies Created

Viewing 15 posts - 166 through 180 (of 1,087 total)

← 1 2 3 … 11 12 13 … 71 72 73 →

Author

Posts
November 25, 2020 at 12:01 in reply to: Response to Speech Processing feedback of 2020-10 #13257
Simon
Professor
Tutorials and tutorial groups

This is the first time we have run the course with small tutorial groups. Our normal format is to divide the class into two large groups of around 40 students, and to run 1 hour 50 minute computing lab sessions with the lecturer and two tutors in attendance.

We decided to replace that with two sessions per group per week. Inevitably, a 50 minute tutorial will always feel short and will run out of time, but we were not able to resource double-length tutorials for the number of groups required.

We carefully designed each group to contain students from a variety of backgrounds, in the hope that you would help each other with some of the basic background skills. This was only partially successful.

There was a wide variation in how well tutorial groups worked. Many did a great job of meeting in advance of tutorials to study together and prepare questions. Other groups didn’t manage this. We don’t fully understand why this is and would appreciate additional feedback from all students.
November 25, 2020 at 11:54 in reply to: Response to Speech Processing feedback of 2020-10 #13256
Simon
Professor
First assignment

Improve the documentation of Festival and/or the assignment.

It is not necessary to become a Festival expert to do the assignment. Some of you worried that you should be explaining precisely how Festival works, but the assignment didn’t require that. You were allowed to assume that Festival works as per the theory taught in class. We will make this more clear in future.

Walk-through examples of how to analyse some errors

Normally this would have happened during in-person lab sessions. We were hoping that tutorials and the forums would replace this, but that didn’t work as well as expected. In future, if we teach the course without in-person lab sessions, we will add walk-throughs, possibly as videos.
November 25, 2020 at 11:49 in reply to: Response to Speech Processing feedback of 2020-10 #13255
Simon
Professor
The SIGNALS material (especially the notebooks) is too hard

As noted in the response to the numerical scores, this was a result of our experimentation with this new material. We don’t apologise for conducting experiments on you! We need to do that every year to keep improving the course.

Several of you made excellent suggestions and we will incorporate some of these in future:
- Divide material into levels of difficulty, or prioritise into essential/recommended/extra like the readings
- Add walk-through videos or live classes
November 25, 2020 at 11:44 in reply to: Response to Speech Processing feedback of 2020-10 #13254
Simon
Professor
The PHON material is too hard

In line with the numerical score distribution, around a third of respondents found the PHON material very challenging. We will adjust the delivery of this material in future to make it more accessible to people without any background.

For this year, the PHON material is being assessed in both items of coursework and in the exam. Our expectations of what students will master are realistic. Knowing the basics will be enough to do well on this course.
November 25, 2020 at 10:58 in reply to: Response to Speech Processing feedback of 2020-10 #13253
Simon
Professor
Summary of the numerical responses, with my responses

There is enough ongoing technical support (for the VM, Festival etc): 4.4

This seems to be working fine.

The tutorial sessions are useful: 3.8

The tutorials appear to be mostly useful, but we accept that there is room for improvement. In the second half of the course we will shift slightly towards covering the core material rather than problem sets / Python notebooks. Tutorial groups vary widely in how well they are working – see response to written comments below.

I find speech.zone easy to use and navigate: 4.5

Thank-you. If there is anything you would like to improve in general about the site, post on the Requests forum.

I find the forums on speech.zone useful: 4.4

Also good – please keep using them!

The lecturers (Simon, Catherine, and Rebekkah) are helpful: 4.6 (no ratings below 3 ;5 responses of 3; the rest 4 or 5)

Thank-you for the positive feedback. We very much appreciate that.

The tutors (Jason, and Jason) are helpful: 4.1

Thank-you from the tutors.

The difficulty of the SIGNALS tutorial material is appropriate (1 – too easy, 3- just right, 5 – too hard): 3.8 (with only one response below 3 and all others 3, 4, or 5)

The best way for us to improve the course is to experiment a little each year. This material (new for this year) was designed to be challenging. It ended up a little more challenging than intended. We will be keeping this material in the course, but we will divide each notebook into levels of difficulty to make it clearer what is essential and what is optional.

The difficulty of the PHON tutorial material is appropriate (1 – too easy, 3- just right, 5 – too hard): 3.0 (with a spread of scores right across the range).

This area is where a few students will have a strong background and other no background at all. The wide spread of scores surprised us. In future, we will try to make this material more accessible for students without any background.

I would like a weekly timetabled lecture for the remaining modules (1-strongly disagree, 5-strongly agree): 3.4 (with 8 people rating below 3 and all others 3,4 or 5)

General positive feeling towards this, but not overwhelming. We have responded to this by providing a pre-recorded class video at the start of each week, then going over that same material in a live format on Tuesdays. The live class will be recorded, so attendance is not required for those who cannot make it.
November 24, 2020 at 09:31 in reply to: when is negative log likelihood used and why #13237
Simon
Professor
Distance is what we use in pattern matching. A smaller distance is better: it means that patterns are more similar. The Euclidean distance is an example of a distance metric.

Probability is what we use in probabilistic modelling. We use probability density rather than probability mass when modelling a continuous variable with a probability density function. The Gaussian is an example of a probability density function (pdf).

When we use a generative model, such as a Gaussian, to compute the probability of emitting an observation given the model (= conditional on the model) we are calculating a conditional probability density, which we call the likelihood.

A larger probability, probability density, or likelihood is better: it indicates that a model is more likely to have generated the observation.

To do classification, we only ever compare distances or likelihoods between models. We don’t care about the absolute value, just which is smallest (lowest) or largest (highest), respectively.

The log is a monotonically increasing function, so does not change anything with respect to comparing values. We take the log only for reasons of numerical precision.

It doesn’t matter that probability densities are not necessarily between 0 and 1; they are always positive, so we can always take the log. A higher probability density leads to a higher log likelihood.

Taking the negative simply inverts the direction for comparisons. We might use negative log likelihoods in specific speech technology applications when it feels more natural to have a measure that behaves like a distance (smaller is better).

In general, for Automatic Speech Recognition, we use log likelihood. This is what HTK prints out, for example. Those numbers will almost always be negative in practice, but a positive log likelihood is possible in theory because a likelihood can be greater than one when it is computed using a probability density.
November 24, 2020 at 08:54 in reply to: HMM for Synthesis #13236
Simon
Professor
Yes, HMMs can be used for speech synthesis. We would not use MFCCs because they are not invertible, due to the filterbank. We might use something closer to the true cepstrum.
November 23, 2020 at 18:02 in reply to: Pre-emphasis and fricatives #13233
Simon
Professor
Pre-emphasis is primarily to avoid numerical problems in taking the DFT of a signal with very different amounts of energy at different frequencies. Using too much pre-emphasis might have negative consequences by over-boosting high frequencies, but typically only modest pre-emphasis is applied.

Pre-emphasis will boost any non-speech at higher frequencies, yes. So, if that is noise then pre-emphasis will make the Signal-to-Noise Ratio (SNR) worse. Again, this should not be a problem with the modest amount of pre-emphasis typically used.
November 21, 2020 at 11:15 in reply to: Capturing the output printed by a program #13210
Simon
Professor
Try this (again, I’m using echo just as an example of a program that produces output on stdout – you are probably trying to capture the output of a more interesting program):
```
$ echo -e "this is 1\nand now 2"
$ echo -e "this is 1\nand now 2" | grep this
$ echo -e "this is 1\nand now 2" | grep this | cut -d" " -f3
```
cut cuts vertically and the above arguments say “define the delimiter as the space character, cut into fields using that delimiter, and give me the third field”

To capture the output of a program, or of a pipeline of programs as we have above, we need to run it inside “backticks”. So, let’s capture the output of that pipeline of commands and store it in a shell variable called MYVAR:
```
$ MYVAR=`echo -e "this is 1\nand now 2" | grep this | cut -d" " -f3`
$ echo The value is: ${MYVAR}
```
November 20, 2020 at 15:03 in reply to: Capturing the output printed by a program #13194
Simon
Professor
Unix programs print output on one or both of two output streams called stdout (standard out) and stderr (standard error). The former is meant for actual output and the latter for error messages, although programs are free to use either for any purpose.

Here’s a program that prints to stdout for testing purposes: it’s just the echo command:
```
$ echo -e "the first line\nthe second line"
```
which will print this to stdout
```
the first line
the second line
```
Now let’s capture that and do something useful with it. We ‘pipe’ the output to the next program using “|” (pronounced “pipe”):
```
$ echo -e "the first line\nthe second line" | grep second
```
which gives
```
the second line
```
where grep finds the pattern of interest. Or how about cutting vertically to get a certain character range
```
$ echo -e "the first line\nthe second line" | cut -c 5-9
```
which gives
```
first
secon
```
Now combine them:
```
$ echo -e "the first line\nthe second line" | grep first | cut -c 5-9
```
to print only
```
first
```
November 19, 2020 at 17:17 in reply to: Using the -S option with HTK tools #13180
Simon
Professor
At first, yes, make it by hand so you get the format right.

Eventually, you’ll want to make it from a list of USER names. The forum on shell scripting has some tips on how you might do this using a for loop that reads the USER list from one file and creates an HTK script in another file.
November 19, 2020 at 17:13 in reply to: Using the -S option with HTK tools #13178
Simon
Professor
a script file (e.g., it might be called train.scp) would look something like this – a list of MFCC files with full paths:
```
/Volumes/Network/courses/sp/data/mfcc/train/s1764494_train.mfcc
/Volumes/Network/courses/sp/data/mfcc/train/s1766810_train.mfcc
/Volumes/Network/courses/sp/data/mfcc/train/s1770642_train.mfcc
```
You pass this using the -S flag like this
```
HRest ...other command line args here... \
      -S train.scp \
      models/hmm0/$WORD
```
noting that when you use -S you no longer pass any MFCC files as command line arguments.
November 18, 2020 at 15:21 in reply to: Running Scripts Error #13122
Simon
Professor
You are probably running the script from inside the scripts directory. The scripts are very simple and only work from the main directory. You should run them like this:
```
$ pwd
/home/atlab/Documents/sp/digit_recogniser
$ ./scripts/initialise_models
```
November 18, 2020 at 13:51 in reply to: Initialise and Train Models File #13119
Simon
Professor
Your script should be simply
```
#!/usr/bin/bash
./scripts/initialise_models
./scripts/train_models
```
You don’t need the $ – those represent the bash prompt in the instructions.

Your line SCRIPT_PATH="/Documents/sp/digit_recogniser/" is not needed because you don’t use the shell variable SCRIPT_PATH anywhere later in the script. If you did need this, then the path should most probably be ~/Documents/sp/digit_recogniser/ where ~ is shorthand for your home directory.
November 18, 2020 at 12:43 in reply to: Review Lecture #13117
Simon
Professor
Look under the “Class” tab for Module 7. The “Live class” item now has written notes.

Let me know if this is useful and I will add this for other modules.
Author

Posts

Viewing 15 posts - 166 through 180 (of 1,087 total)

← 1 2 3 … 11 12 13 … 71 72 73 →

Simon

Forum Replies Created

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis