Forum Replies Created
-
AuthorPosts
-
In a filterbank, there are a set of bandpass filters (perhaps 20 to 30 of them). Each one selects a range (or a “band”) of frequencies from the signal.
The filters in a filterbank are fixed and do not vary. We, as the system designer, choose the frequency bands – for example, we might space them evenly on a Mel scale, taking inspiration from human hearing.
The feature vector produced by the filterbank is a vector containing, in each element, the amount of energy captured in each frequency band.
The filter in a source-filter model is a more complex filter than the ones in a filterbank, in two ways:
- it’s not just a simple bandpass filter, but has a more complex frequency response, in order to model the vocal tract transfer function
- it varies over time (it can be fitted to an individual frame of speech waveform)
This filter is inspired not by human hearing, but by speech production.
The simplest type of feature vector derived from the filter in a source-filter model would be a vector containing, in each element, one of the filter’s coefficients. Together, the set of filter coefficients captures the vocal tract transfer function (or, more abstractly, the spectral envelope of the speech signal).
I’m aware of the third edition currently under construction. As with previous editions, Jurafsky & Martin make this freely available until it goes off to the publishers (at which point they will presumably withdraw the draft version). As you say, the speech material is not yet updated, so we are staying with the second edition for now.
In these slides, I am temporarily imagining that the Fourier co-efficients (i.e., the magnitude spectrum) would be a good representation for Automatic Speech Recognition. Whilst we could use them, we can do better by performing some feature engineering – this is covered a little later on.
Slide 10: these co-efficients are the amount of energy at that frequency – these are the Fourier co-efficients (think of them as weights that multiply each sine wave). If we plot them, then we get the spectrum of the signal.
Slide 11: the number of Fourier co-efficients depends on the duration of the signal being analysed. But remember that we don’t analyse the whole signal at once: we divide it into short frames and perform the analysis on each frame in turn. The frames all have the same, fixed duration (e.g., 25ms).
The number of frames is equal to the total duration of the speech signal (e.g., an utterance) divided by the frame shift (e.g., 10ms).
Thanks – just a typo in my WordPress code – now fixed.
…and finally a summary of the good things that were most frequently mentioned
From a total of 52 responses:
- The website (content,structure,videos,etc) is good: 27
- The material is interesting: 20
- The classes and/or lecturer are good: 20
- Access to resources is good: 10
We have to take notes separately for each mode of learning (videos, lectures, labs, readings)
That’s one approach, but I suggest that you gradually build and organise a single set of notes that brings together the different modes of learning. The act of constructing this will help you learn better.
The labs are before lectures, but they should be afterwards
The labs are after the corresponding lecture (there happens to be a weekend inbetween, but that’s just how the timetable works out). The only exception to this is the first lab of the automatic speech recognition assignment, which is before any lectures on that topic. This is to get you started with the practical work as far ahead of the assignment deadline as possible.
Provide a reading list for each module
This comment really surprises me because the readings for each module are in a tab within each module, listed in the order in which you should read them. You can also access alphabetically-ordered lists from the readings hub page, organised in various ways.
Provide sample (or previous) exam papers
Already available on the University library website. More information about this year’s exam will be provided later in the course.
Provide a speed control for the videos
Classes are too short
1-2 hours per week of lectures and 2 hours available supervised lab time seems reasonable for either a 10 credit postgraduate course, or 20 credit undergraduate one. The guidelines in my subject area are 18 contact hours (2 per week for 9 weeks) for 10 credits and 27 hours (3 per week) for 20 credits.
The textbooks are not easy to find on “the internet”
They are easy to buy from many online retailers. Perhaps you mean illegal “free” copies? Don’t use these. You are cheating hard-working authors, and breaking the law. The library has physical and/or eBook copies of most readings required for Speech Processing. If you’re serious about your studies, then you need to invest in the key textbooks.
Lectures repeat the videos
Yes, but this should only happen when I feel that the video wasn’t good enough.
Create a FAQ for the lab
This already exists on the speech.zone forums for Assignment 1 and Assignment 2, and you need to both search and browse them. If your question is not already answered there, then post a new one.
-
AuthorPosts