Forum Replies Created
-
AuthorPosts
-
This answer really needs to be in the form of a diagram. Try uploading one here, annotated with the terms you use in your written explanation, and I’ll check it for you.
The cepstrum is created after the cosine transform – the cepstral coefficients are the coefficients of that series expansion (the weights on the cosine basis functions).
Any form of source-filter separation has to make some assumption about either the form of the filter, or of the source (or both). Otherwise, the problem is insoluble.
For this discussion, let’s assume “the vocal tract filter’s frequency response” and “the spectral envelope” are the same thing.
We assume the lower cepstral coefficients represent the vocal tract filter’s frequency response because they capture the slower-changing (with respect to frequency) components of the spectrum. We are making the assumption that the vocal tract filter’s frequency response is rather slowly-changing (with respect to frequency).
You are right that the mel-scale filterbank was designed to smooth away F0, so in fact the truncation step might not be needed for that purpose. Nevertheless, we still want to truncate so that we have small number of features in our final feature vector. Truncation serves multiple purposes, of which “removal of any remaining traces of F0” is just one.
You are right that we can only observe the vocal tract filter’s frequency response (which includes the formant peaks) at frequencies where the source has energy. For voiced speech, that means at the harmonics. But the vocal tract filter’s frequency response exists at all frequencies, even between harmonics – it’s just that we can’t see it directly. Think of it as “joining the dots” that are the harmonics.
I’m sorry about this. The publisher of this book has a very unusual pricing policy that means the library is unable to buy more viewing ‘credits’ for this copy, and additional copies are prohibitively expensive (nearly £2000 and still only allowing 100 views for that copy!).
The lengths of both the exemplar (template) and unknown can vary. The dynamic programming algorithm works in all cases.
But, during the process of alignment, for a particular pair of exemplar+unknown, the lengths of both are known and are constant.
They are pointing to a problem with simple distance metrics such as the Euclidean distance. This metric assumes all dimensions of the feature vector are equally important and simply sums up the squared differences between corresponding elements in the two vectors being compared.
This is sensitive to the scale of each element.
Take the example of filterbank energies as our feature vector, and that – in general and on average across all the data – the amount of energy in the 2nd filter is around 10 times larger than that in the 11th filter. (Look at a typical speech magnitude spectrum to see why this could be the case.)
The 2nd element of the feature vector will contribute about 10 times as much to the total distance being calculated as the 11th element. It is being treated as more important.
One solution to this would be to weight the elements as we sum them up in the Euclidean distance, to balance their contributions according to how important we think they are.
This is precisely what the Gaussian distribution does for us: it is what the standard deviation parameter is for. This scales each dimension of the distance calculation according to the amount of variability we see along that dimension for the class we are modelling.
Out of scope for this course, but something you will see in the literature, is a scaled Euclidean distance called the Mahalanobis distance. That is the same form that appears in the exponent of the Gaussian equation.
You can enable file sharing between VM and host in the VMWare settings. We generally recommend working in the VM because everything you need is there.
However, it is possible to work on your own linux machine if you have the skills to compile HTK from source. You will need to download your own copy of HTK, and agree to the license.
If you copy the data from the VM to your local machine, you must delete it at the end of the course.
You also need to uncomment the line setting the path to all the data:
# DATA=${DATA:-/Volumes/Network/courses/sp/data}
This week’s B tutorial will explain the assignment – you don’t need to start working on it until after that tutorial.
The path you show is perfectly valid, although of course it seems a rather unlikely one and will probably result in a large global distance (indicating that the template and unknown are probably not very similar). The number of local distances summed up to make the global distance can indeed vary depending on the path taken through the grid.
The two sequences of feature vectors can be of differing lengths – this will generally be the case, in fact.
Try forcibly quitting the VMWare host program. On a Mac this would be the Apple menu (top left of your screen), then “Force Quit…”, or via the Activity Monitor. On Windows, use the Task Manager.
Work in the virtual machine. HTK is already installed. You do not need Wavesurfer because this year we are going to skip the data collection part.
Don’t start the assignment until you’ve had Module 6 Tutorial B. Just read the instructions through before that tutorial.
You should have created
~/Documents/sp
in the first assignment. If not, create it now.Do not work in
/Volumes/Network/courses/sp/digit_recogniser
– those are the master copies which you need to make a copy of.We will be announcing details of the exam shortly – it will be a take-home timed test in the December exam diet. It was not timetabled by the central unit, and so we will need to schedule this ourselves.
I always prefer making notes on paper, whether in-person or online. This is because it’s faster to write by hand than to type, and easier to draw diagrams than using electronic note-taking software. But, perhaps I am a bit old-school and others will have better advice.
The new (for 2020-21) topic videos are made with the following tools:
Waveform and spectrum plots including the animated versions: my own Python code using matplotlib, scipy, librosa
Slides, including other animations: Apple Keynote
Screen capture and video editing: Screenflow
Audio hardware: DPA headset microphone, dbx 286s mic preamp / processor, Apogee Duet audio interface
Video compression: Handbrake
Can you post the complete steps needed to reproduce this, from starting festival onwards.
Have you tried clicking the “restart or reconnect to it” link?
-
AuthorPosts