Forum Replies Created
-
AuthorPosts
-
More structured labs vs. Too many interruptions in the lab
This is always a difficult balance. The intention is to provide structured labs (i.e., more interruptions!) during the first week or two of each assignment, then focus on providing individual help for the last week or two.
More quizzes, including online
I’ll keep using TopHat, in class and offline. You need to check TopHat outside class, to see what questions I have placed there for review (this will include any used in class, plus additional questions).
Use Python for the assignments
That’s not practical, because not all students on this course can program. Of course, you are free to use Python (or any other language) to do parts of the assignments. This makes most sense for the automatic speech recognition assignment: you can re-implement the shell scripts as Python, and fully automate all of your experiments. You could also plot your results and create your tables, using code.
Simon speaks too fast
This is fair criticism. I will keep trying to improve, if you keep giving me feedback. For the videos, and recorded lectures, you can control the playback speed (both slower or faster).
The assignment is vague
Perhaps you were expecting a fixed set of problems to solve? The assignments are deliberately somewhat open ended, in order to give you room to think, to learn, and do well. It’s possible to get a decent mark by simply doing what is described. But, there is also plenty of headroom to get a high mark by going beyond the instructions and demonstrating the full extent of your understanding. Always remember that the primary goal of the coursework is to help you learn. The grading is of secondary importance.
The videos are too long
I’m not sure if this comment refers to individual video clips (which some students have previously said are too short and fragmented), or to the total amount of video to be watched per week. Please let me know which.
Too much preparation is required for classes
With a ‘flipped classroom’, the idea is to shift some of your learning – especially the basic concepts and main readings – to before the class. The class can then be more effective. The total amount of study required for the course should be the same as for a more traditional format.
More help with writing
This included a request for a complete sample assignment in order to see what is expected. I’m not going to do that, for good pedagogical reasons:
- it might suggest that there is only one way to write a good lab report or literature review; there are many ways to do well on the assignments
- it would reduce the amount of thinking that you need to do; that would reduce the amount you learn by the end of the course
There will be further help with writing throughout the course, including feedback on the first assignment and a writing clinic for the second assignment.
Coursework deadlines are close to those of other courses
With a diverse class in which students take many different course combinations, there’s a limit to how much we can do about this.
Remember that deadlines are simply the latest date on which you can submit. You need to plan ahead. Make a calendar for the entire semester, look for the ‘hotspots’, then set yourself earlier deadlines in order to spread them out.
Adjective. See https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Note: Festival’s POS tagger makes many mistakes.
You can work this out for yourself, running in “step-by-step mode”. Use a sentence that includes a token needing expansion (e.g., “$3.21”) and see at which step it becomes a sequence of words.
Remember that the individual steps (modules) in Festival may each perform multiple processes, so it’s possible that classification and expansion might happen in the same module, or in separate modules. Again, this is something you can work out for yourself in the lab.
Recordings now capture the complete lecture – the bug with recording duration has been fixed. I’m also recording lab sessions – find these in the same place.
Yes, TD-PSOLA is still used for speech modification – for relatively small changes in duration or F0, it gives very high quality (if implemented carefully).
You’re right that TD-PSOLA looks “crude”. I’d prefer to say that it is deceptively simple and really quite elegant, once you deeply understand what it is doing. It’s actually an implicit source-filter separation, but can only modify the source (i.e., duration and F0). It cannot modify the filter.
Do you understand where the filter is in TD-PSOLA? Why is it not possible to modify it?
Your alternative suggestion is spot on: to obtain the spectral envelope and to “feed it” (we usually say “excite it”) with a new source signal of the desired F0. This is precisely what an explicit source-filter model can do. Linear prediction is a common choice for the filter.
You suggest feeding the filter with a “different F0”. Given that the source-filter model operates in the time domain, what exactly would a “different F0” mean? Can you draw a diagram?
We shouldn’t talk about “words being tokenised” because tokenisation happens before we know anything about words. The input to TTS is a string of characters. Tokenisation splits this long string into small pieces, ready for further processing. The method might be as simple as some rules using whitespace and punctuation. Each small piece might already be a normal word, or it might not: a Non Standard Word (NSW).
The exercise in the lecture was not about tokenisation. It was about normalisation, which is usually done in two stages: 1) classify each token as either a standard word, or a NSW of one of a set of types (e.g., abbreviation, money, percentage,…); 2) expand each NSW into normal words, using a specific technique for each type.
The features needed for the classification step cannot be things like “is it an abbreviation” because that is what the classifier is predicting. We can only use features that can be obtained directly from the character string, such as “Is it all upper case?” or “Does it contain 3 or more consecutive digits?”
The expansion step involves a specific technique for each type of NSW. For example:
- ASWD (“as word”) would be downcased and passed to the Letter-to-Sound (LTS) module to be treated like any other Out-of-Vocabulary (OOV) word
- LSEQ (“letter sequence”) would be split into individual letters, each of which becomes a word; the dictionary will contain pronunciations for all individual letters in the language
We didn’t cover expansion in any great detail in class. Details can be found in the readings: Jurafsky & Martin 8.1.
-
AuthorPosts