Forum Replies Created
-
AuthorPosts
-
Positive comments
(Each of these points was made by at least 5 people)
The tutors are very helpful in labs
Lectures are well-organised and well-prepared
The lecture pace is good
The interactive aspects and in-class demos are helpful
The website has a good layout and accessing the material is easy
The videos are a good resource
The quantity and variety of resources on the website
The flipped classroom
The varied teaching styles / modes
The foundation class + main lecture concept
Assignments and assessment
Start them earlier in the course – this is impossible because we need a couple of weeks to allow for students to be properly enrolled (especially because some Schools only sign students up in week 2) and to lay the foundations.
Set deadlines for submitting a draft, and give feedback on that – I wish there was the time and marking capacity to do something like this, but there isn’t. Instead, I provide you with the feedback lectures from previous years (see next point).
Provide example past student assignments as a guide – I definitely won’t do this, because there is more than one way to do well and examples will appear to be prescriptive. But I do already provide a large amount of excerpts from good and bad work for you to learn from – these are the feedback lectures from previous years – see the tips on writing up for the first assignment.
Provide past exam papers – these are already available from the library.
Video and website content
Some videos are quite long – you are correct. I am planning to rebuild all the material from scratch in the future. I have already done this for the Speech Synthesis course.
Provide subtitles (closed captions) – yes, I want to do this, but it is a lot of work (even if manually correcting ASR output) so I will do it in the future after improving the videos. Speech Synthesis already has subtitles and transcripts for the videos.
Different slide packs for videos vs classes, which is messy and confusing – I agree. This is an unfortunate side-effect of the flipped classroom, where the videos are static, but each year’s class is revised afresh in response to students’ needs. I don’t have a good solution at the moment, but welcome suggestions.
Provide a cumulative video time per module so we know how much there is to watch – yes, a great idea, that I plan to implement.
Can the calendar automatically scroll to the current date? – I can’t find an easy way to implement this at the moment, so will add this to my long-term website wishlist.
Can the lecture be longer than 2 hours please?
The contact hours for this course are calculated as follows, in order to arrive at the standard number of contact hours for UG and PG respectively
UG: 27 hours = 18 hours lectures (includes all foundational material) + 9 hours lab time
PG: 18 hours = 9 hours lectures (excludes foundational material) + 9 hours lab time
In reality, you will get more contact time than this because lab sessions are two hours long, not just one hour per week.
So, there isn’t really any space in which to extend the class hours. If you want to maximise contact time, make sure you attend all available lab sessions, including any extra ones, writing clinics, etc.
The course is a lot of work for 10 credits (PG version)
There is always the risk with the flipped classroom of accidentally adding more material, so I try not to do this. Please tell me if I am failing, with specific examples.
The material is structured in a way that tries to make a distinction between what is essential and what is optional. Covering only the essentials is enough to pass the course with a solid mark. Adding some of the recommended material makes a high mark possible.
Please can lectures not be at 9am?
Sorry, no. Timetabling is not negotiable.
Foundation lectures are sometimes redundant. Can you list the topics to be covered in advance?
I’ve tried to make these sessions responsive to current students’ requests, rather than prescribe the topics to cover. But, providing a clearer framework in advance is a good idea, and a list of potential topics would work. I will try this in future.
Can we have an additional TA (Teaching Assistant) hour, on top of the labs?
The labs are intended to be a combination of practical work on the computers, and a time to ask the tutors questions. Both tutors this year have taken Speech Processing in the past (and got very high marks), so they are also able to act as Teaching Assistants. You can ask them theoretical as well as practical questions.
If you find that you are not getting enough time with the tutors, please tell me and I will try to arrange a third person.
Coming to the lab sessions is inconvenient
They are an integral part of the course, so you might also say coming to the lectures is inconvenient. The tutors are there to offer active help, and not just ‘problem solving’. You should make better use of their time: ask them more questions, and you will find the labs more useful.
Deriving optimisation algorithms (e.g., Expectation Maximisation) is also beyond the scope of the course. But, I’d be willing to offer an additional session on this if there is enough interest – please survey your fellow students and let me know how many would like this.
For Speech Processing, we don’t really need much Linear Algebra beyond vectors. There will be some more advanced material in Speech Synthesis, where we will use Linear Algebra operations (affine transforms) to adapt Gaussians to new data.
Information Theory is a powerful tool, but well beyond the scope of Speech Processing. I’d be happy to help you one-on-one or in a small group, if this is something you are trying to understand.
Good answer Danielle. But we should note that this paper is specifically about frequency scales for representing pitch (the perceptual correlate of fundamental frequency) rather than the more general spectral envelope information (e.g., formant frequencies) that is important for speech recognition.
Regarding the choice of frequency scale for Automatic Speech Recognition (ASR), the key property we want is a non-linear scale that compresses the higher frequencies more than the lower ones. In other words, the resulting features (e.g., filterbank energies) use more co-efficients to describe the most important (i.e., most informative) frequency range for speech up to around 3 kHz, and fewer co-efficients for the higher frequencies that are less important (i.e, contain less information).
All perceptual scales (Mel, Bark, etc) have this property. They will all work much the same for this application and the choice is made either through personal preference, or empirically by experimentation. The Mel scale is by far the most popular for ASR.
Apologies for not updating this information – yes, this year we will again use the built-in microphone on an iMac in the lab (or any other iMac you have access to).
I’ve reported the problem – the submission system is set up by the teaching offices, not me. You can email your submission to them, if Learn is not working.
-
AuthorPosts