Module 0 – getting started

Start here! Gives an introduction to the course, explains how the course is delivered, and describes the computing environment you will need.

Start Team

Welcome to the course! This is Module 0. It doesn’t have any course content, but introduces you to the course and tells you what is covered. It explains how the course is delivered and what you need to do to prepare for taking the course.

What the course covers

What is speech processing? Here, we mean the two main applications of speech technology: Speech Synthesis (the generation of speech from text input) and Automatic Speech Recognition (the transcription of speech into text). That’s going to involve processing speech (and sometimes text) in various ways. To do Speech Synthesis, we construct a new speech signal to say out loud any text that is input to the system; that will involve predicting which phonemes are required to pronounce the words in that text. To do Automatic Speech Recognition, we extract salient features from the input speech and use machine learning to classify sequences of these features into a sequence of words, each of which is made of a sequence of phonemes. Already, you can see that we need to bring together phonetics, signal processing and machine learning to solve these problems.

The phonetics modules in this course are intended to complement the speech processing content. This course will not provide a comprehensive background in phonetics, but it will get you started thinking about speech using the terms that phoneticians use. It will provide you with some foundational concepts and tools to relate speech processing to phonetic descriptions and annotations of speech.

The course will teach you the necessary theory: speech as a signal that carries information, how speech is produced by a talker and perceived by a listener, signal processing, and probabilistic modelling. We’ll need to underpin everything with a foundation course in phonetics. Since this is an interdisciplinary area, we’ll need to learn a diverse set of practical skills too, including maths, efficient use of software tools, and communicating our understanding through scientific writing.

How the course is organised and delivered

Modules

Speech Processing has 10 content modules, numbered 1 to 10 and each designed to take 1 week of study. Within each module you will find videos, readings, instructions for the labs, and self-test quizzes (for most modules). You can think of each module as starting with the lecture (Thursday 09.00-10.50) and ending with a related lab (the following Wednesday 09.00-10.50 or 16.10-18.00). Between the lecture and the lab there is scheduled Q&A session (Tuesdays 14.10-15.00 or 15.10-16.00), as well as videos for you to watch and readings to do.

Finding the course content

Almost everything you need is here on speech.zone and we have tried to make it as self-contained as possible. However, you will need to access some other resources as follows:

Some readings are only available through the library.
Blackboard Learn will be used to: send class-wide and tutorial group messages; submit your assignments.
Microsoft Teams will be used for online lectures, and you can also use the course Team to post questions for the Question/Answer session or to discuss course related stuff with your classmates. We may use Zoom as a backup or for certain technical support tasks (e.g., if a tutor needs to control your computer keyboard and mouse to solve a problem for you).
gather.town will be used for online access to the lab sessions

Navigating the course content

The course hub is the place to start. There you’ll find the 10 content modules along with a weekly schedule reading lists, milestones to keep you on track with the assignments, and the marking policy.

Each module has a similar structure. On each module page you’ll see a sequence of tabs, which you should work through in order from left to right. Everything for a module is on a single page, but clicking will open that part in a new browser tab or window. For videos, that allows you to see the transcript.

Lectures (online)

Lectures will be held online on Thursday 9:00-10:50. These will be held through Microsoft Teams. The link for the lecture will be shared with people who are enrolled through the course Team, course email list and Learn. If you want to attend the lecture but aren’t enrolled, please get in touch with the course organizer.

Q&A session (in-person + online)

You will be assigned to one of two in-person Question and Answer/workshop sessions per week. These are an opportunity to ask questions about previous week’s content. You can post your questions before the sessions on Teams chat or just bring them to class.

Q&A sessions will be held on Tuesdays:

14:10-15:00 – 40GS_Lecture Theatre A, 40 George Square Lecture Theatres, Central
15:10-16:00- AT_Lecture Theatre 4, Appleton Tower, Central

Labs (in-person + online)

Labs start in week 2. You can attend either of the sessions on Wednesdays 9:00-10:50 and 16:10-18:00. We will be trying out a hybrid setup this semester, so you can attend in-person or online through Gathertown.

The in-person location: Hugh Robson Building computer lab
Online location: gather.town (link to be announced)

The labs are self-paced: you should go through the lab materials and ask for help from the tutors as you need it. We’d also encourage you to work through the materials with your classmates. Generally, you should try to watch all the videos and complete the essential readings before attempting the lab exercises, but even if you can’t do this you should still come to your lab session. You can think of it as some time you can devote to this course where help will be around.

The phonetics and signals labs really aren’t about coding (though the signals labs will involve some code for you to run and play with), but rather to get you to think through the concepts introduced in the lecture videos. For notebooks that have python code included, you don’t need to understand all the workings of the code and you definitely won’t need to reproduce it. However, we do want you to get some exposure to how these abstract concepts get translated into concrete programming (and do something a bit interactive with the computer and each other!).

The TTS and ASR labs will be focused on the TTS and ASR assignments. So, they will have more of a potential coding component (generally bash shell scripting), but you can still do the assignments even if you don’t have a programming background.

Discussion forums

Accounts on speech.zone: enrolled students will be given an account here on speech.zone, which allows you to access all forums and to post on the forums. It is therefore essential to be formally enrolled for this course as soon as possible because you won’t appear on the student list until you are, and so we cannot create a speech.zone account for you. If you encounter delays in enrollment, email the course organizer (contact details in the DRPS entry for the course).

Speech zone accounts have been made for everyone who was enrolled at the beginning of week1 (20/9/21). To activate your account use the ‘forgot password’ link in the login box to set a password. Your username is your University of Edinburgh UUN. Please let the course organizer know if you enrolled after that date.

Please familiarise yourself with the forum structure – there is a lot of useful information there already. A few forums are hidden from public view, so you need to be logged in to find them (and the forum search facility won’t return results from them unless you are logged in).

Try to post in the most appropriate place, but don’t be surprised if we move your post to another forum to keep everything well-organised. We sometimes lightly edit your questions to make them clear and concise, so that they are useful for everyone.

Always use your University login and an approved secure platform for communication: Microsoft Teams, Blackboard Collaborate, Zoom, gather.town, and your University-provided email. Do not use other platforms for completing coursework (eg., Skype, Slack, Discord, Facebook, WhatsApp,…) and please don’t ask classmates to use these platforms for this course, even if you use them for personal, social communication.

Assessment

There are three items of assessment. Two of them are practical assignments and you will be assessed on those by writing a lab report for each of them. The final item is at the end of the semester and will be a form of exam (details to be announced later).

There are no marks attached to for attendance at the timetabled lab or Q&A sessions. But, of course, you’ll get a better mark on each item of assessment if you learn more – and the way to learn is to use the materials and sessions providing. Our experience is that working with other people in the class is beneficial for students. But everyone is different and if you learn better on your own that’s completely fine! What we would ask is that if you feel stuck, that you ask someone for help!

Meet the team delivering Speech Processing – click on each of them to find out more.

Rebekka Puderbaugh
Lecturer and Phonetics Expert
Emelie Van de Vreken
Tutor
Jason Fong
Tutor