Page 39

Forum Replies Created

Viewing 15 posts - 571 through 585 (of 1,073 total)

← 1 2 3 … 38 39 40 … 70 71 72 →

Author

Posts
October 29, 2017 at 17:12 in reply to: Response to Speech Processing feedback of 2017-10-19 #8176
Simon
Professor
More structured labs vs. Too many interruptions in the lab

This is always a difficult balance. The intention is to provide structured labs (i.e., more interruptions!) during the first week or two of each assignment, then focus on providing individual help for the last week or two.
October 29, 2017 at 17:11 in reply to: Response to Speech Processing feedback of 2017-10-19 #8175
Simon
Professor
More quizzes, including online

I’ll keep using TopHat, in class and offline. You need to check TopHat outside class, to see what questions I have placed there for review (this will include any used in class, plus additional questions).
October 29, 2017 at 17:11 in reply to: Response to Speech Processing feedback of 2017-10-19 #8174
Simon
Professor
Use Python for the assignments

That’s not practical, because not all students on this course can program. Of course, you are free to use Python (or any other language) to do parts of the assignments. This makes most sense for the automatic speech recognition assignment: you can re-implement the shell scripts as Python, and fully automate all of your experiments. You could also plot your results and create your tables, using code.
October 29, 2017 at 17:10 in reply to: Response to Speech Processing feedback of 2017-10-19 #8173
Simon
Professor
Simon speaks too fast

This is fair criticism. I will keep trying to improve, if you keep giving me feedback. For the videos, and recorded lectures, you can control the playback speed (both slower or faster).
October 29, 2017 at 17:09 in reply to: Response to Speech Processing feedback of 2017-10-19 #8172
Simon
Professor
The assignment is vague

Perhaps you were expecting a fixed set of problems to solve? The assignments are deliberately somewhat open ended, in order to give you room to think, to learn, and do well. It’s possible to get a decent mark by simply doing what is described. But, there is also plenty of headroom to get a high mark by going beyond the instructions and demonstrating the full extent of your understanding. Always remember that the primary goal of the coursework is to help you learn. The grading is of secondary importance.
October 29, 2017 at 17:09 in reply to: Response to Speech Processing feedback of 2017-10-19 #8171
Simon
Professor
The videos are too long

I’m not sure if this comment refers to individual video clips (which some students have previously said are too short and fragmented), or to the total amount of video to be watched per week. Please let me know which.
October 29, 2017 at 17:09 in reply to: Response to Speech Processing feedback of 2017-10-19 #8170
Simon
Professor
Too much preparation is required for classes

With a ‘flipped classroom’, the idea is to shift some of your learning – especially the basic concepts and main readings – to before the class. The class can then be more effective. The total amount of study required for the course should be the same as for a more traditional format.
October 29, 2017 at 17:07 in reply to: Response to Speech Processing feedback of 2017-10-19 #8169
Simon
Professor
More help with writing

This included a request for a complete sample assignment in order to see what is expected. I’m not going to do that, for good pedagogical reasons:
- it might suggest that there is only one way to write a good lab report or literature review; there are many ways to do well on the assignments
- it would reduce the amount of thinking that you need to do; that would reduce the amount you learn by the end of the course
There will be further help with writing throughout the course, including feedback on the first assignment and a writing clinic for the second assignment.
October 29, 2017 at 17:07 in reply to: Response to Speech Processing feedback of 2017-10-19 #8168
Simon
Professor
Coursework deadlines are close to those of other courses

With a diverse class in which students take many different course combinations, there’s a limit to how much we can do about this.

Remember that deadlines are simply the latest date on which you can submit. You need to plan ahead. Make a calendar for the entire semester, look for the ‘hotspots’, then set yourself earlier deadlines in order to spread them out.
October 23, 2017 at 19:04 in reply to: "jj" POS tag #8103
Simon
Professor
Adjective. See https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Note: Festival’s POS tagger makes many mistakes.
October 17, 2017 at 17:58 in reply to: Tokenisation #7953
Simon
Professor
You can work this out for yourself, running in “step-by-step mode”. Use a sentence that includes a token needing expansion (e.g., “$3.21”) and see at which step it becomes a sequence of words.

Remember that the individual steps (modules) in Festival may each perform multiple processes, so it’s possible that classification and expansion might happen in the same module, or in separate modules. Again, this is something you can work out for yourself in the lab.
October 17, 2017 at 09:48 in reply to: lecture recordings #7935
Simon
Professor
Recordings now capture the complete lecture – the bug with recording duration has been fixed. I’m also recording lab sessions – find these in the same place.
October 17, 2017 at 09:05 in reply to: Usage of TD-PSOLA #7933
Simon
Professor
Yes, TD-PSOLA is still used for speech modification – for relatively small changes in duration or F0, it gives very high quality (if implemented carefully).

You’re right that TD-PSOLA looks “crude”. I’d prefer to say that it is deceptively simple and really quite elegant, once you deeply understand what it is doing. It’s actually an implicit source-filter separation, but can only modify the source (i.e., duration and F0). It cannot modify the filter.

Do you understand where the filter is in TD-PSOLA? Why is it not possible to modify it?

Your alternative suggestion is spot on: to obtain the spectral envelope and to “feed it” (we usually say “excite it”) with a new source signal of the desired F0. This is precisely what an explicit source-filter model can do. Linear prediction is a common choice for the filter.

You suggest feeding the filter with a “different F0”. Given that the source-filter model operates in the time domain, what exactly would a “different F0” mean? Can you draw a diagram?
October 13, 2017 at 10:08 in reply to: Tokenisation #7929
Simon
Professor
We shouldn’t talk about “words being tokenised” because tokenisation happens before we know anything about words. The input to TTS is a string of characters. Tokenisation splits this long string into small pieces, ready for further processing. The method might be as simple as some rules using whitespace and punctuation. Each small piece might already be a normal word, or it might not: a Non Standard Word (NSW).

The exercise in the lecture was not about tokenisation. It was about normalisation, which is usually done in two stages: 1) classify each token as either a standard word, or a NSW of one of a set of types (e.g., abbreviation, money, percentage,…); 2) expand each NSW into normal words, using a specific technique for each type.

The features needed for the classification step cannot be things like “is it an abbreviation” because that is what the classifier is predicting. We can only use features that can be obtained directly from the character string, such as “Is it all upper case?” or “Does it contain 3 or more consecutive digits?”

The expansion step involves a specific technique for each type of NSW. For example:
- ASWD (“as word”) would be downcased and passed to the Letter-to-Sound (LTS) module to be treated like any other Out-of-Vocabulary (OOV) word
- LSEQ (“letter sequence”) would be split into individual letters, each of which becomes a word; the dictionary will contain pronunciations for all individual letters in the language
We didn’t cover expansion in any great detail in class. Details can be found in the readings: Jurafsky & Martin 8.1.
October 6, 2017 at 14:05 in reply to: Writing to a specified length #7871
Simon
Professor
Here’s another example of a paper, before and after my editing pass. The target length was 4 pages of text and 1 page of references.

Attachments:
You must be logged in to view attached files.
Author

Posts

Viewing 15 posts - 571 through 585 (of 1,073 total)

← 1 2 3 … 38 39 40 … 70 71 72 →

Simon

Forum Replies Created

Attachments:

Search the forums

Note

Latest Activity

Search the forums

Speech Synthesis