Start

Welcome to the course! This is Module 0. It doesn’t have any course content, but introduces you to the course and tells you what is covered. It explains how the course is delivered and what you need to do to prepare for taking the course.

What the course covers

What is speech processing? Here, we mean the two main applications of speech technology: Speech Synthesis (the generation of speech from text input) and Automatic Speech Recognition (the transcription of speech into text). That’s going to involve processing speech (and sometimes text) in various ways. To do Speech Synthesis, we construct a new speech signal to say out loud any text that is input to the system; that will involve predicting which phonemes are required to pronounce the words in that text. To do Automatic Speech Recognition, we extract salient features from the input speech and use machine learning to classify sequences of these features into a sequence of words, each of which is made of a sequence of phonemes. Already, you can see that we need to bring together phonetics, signal processing and machine learning to solve these problems.

The phonetics modules in this course are intended to complement the speech processing content. This course will not provide a comprehensive background in phonetics, but it will get you started thinking about speech using the terms that phoneticians use. It will provide you with some foundational concepts and tools to relate speech processing to phonetic descriptions and annotations of speech.

The course will cover relevant theory: speech as a signal that carries information, how speech is produced by a talker and perceived by a listener, signal processing, and probabilistic modelling. We’ll need to underpin everything with a foundation course in phonetics. Since this is an interdisciplinary area, we’ll need to learn a diverse set of practical skills too, including maths, efficient use of software tools, and communicating our understanding through scientific writing.

How the course is organized and delivered

Modules

Speech Processing has 10 content modules, numbered 1 to 10 and each designed to take 1 week of study. Within each module you will find videos, readings, instructions for the labs, and self-test quizzes (for most modules). The course will follow a flipped classroom format that. That means you should try to watch the module videos before the Thursday lecture (09.00-10.50). The lab for each module will be on the Wednesday in the following week (09.00-10.50 or 16.10-18.00).

Finding the course content

Almost everything you need is here on speech.zone and we have tried to make it as self-contained as possible. However, you will need to access some other resources as follows:

  • Some readings are only available through the library.
  • Blackboard Learn will be used to: send class-wide messages; find live lecture recordings
  • Blackboard Learn will be used for online tests and assignment submissions

Navigating the course content

The course hub is the place to start. There you’ll find the 10 content modules along with a weekly schedule reading lists, milestones to keep you on track with the assignments, and the marking policy.

Each module has a similar structure. On each module page you’ll see a sequence of tabs, which you should work through in order from left to right. Everything for a module is on a single page, but clicking   will open that part in a new browser tab or window. For videos, that allows you to see the transcript.

Lectures

Lectures will be held online on Thursday 9:00-10:50. In 2022, these will be live, in-person lectures held in 7GS_F.21, 7 George Square.

Labs

Labs start in week 2. You can attend either of the sessions on Wednesdays 9:00-10:50 and 16:10-18:00 which will both be in AT_4.02, the PPLS Computer Lab, Appleton Tower.

The labs are self-paced: you should go through the lab materials and ask for help from the tutors as you need it. We’d also encourage you to work through the materials with your classmates. Generally, you should try to watch all the videos and complete the essential readings before attempting the lab exercises, but even if you can’t do this you should still come to your lab session. You can think of it as some time you can devote to this course where help will be around.

The phonetics and signals labs really aren’t about coding (though the signals labs will involve some code for you to run and play with), but rather to get you to think through the concepts introduced in the lecture videos. For notebooks that have python code included, you don’t need to understand all the workings of the code and you definitely won’t need to reproduce it. However, we do want you to get some exposure to how these abstract concepts get translated into concrete programming (and do something a bit interactive with the computer and each other!).

The TTS and ASR labs will be focused on the TTS and ASR assignments. So, they will have more of a potential coding component (generally bash shell scripting), but you can still do the assignments even if you don’t have a programming background.

Discussion forums

Accounts on speech.zone: enrolled students will be given an account here on speech.zone, which allows you to access all forums and to post on the forums. It is therefore essential to be formally enrolled for this course as soon as possible because you won’t appear on the student list until you are, and so we cannot create a speech.zone account for you. If you encounter delays in enrollment, email the course organizer (contact details in the DRPS entry for the course).

Speech zone accounts will be made for everyone who is enrolled at the beginning of week 1 (accounts should be made by the end of week 1 – sorry for the delay!).

To activate your account use the ‘forgot password’ link in the login box to set a password. Your username is your University of Edinburgh UUN. Please let the course organizer know if you enrolled after that date.

Please familiarise yourself with the forum structure – there is a lot of useful information there already. A few forums are hidden from public view, so you need to be logged in to find them (and the forum search facility won’t return results from them unless you are logged in).

Try to post in the most appropriate place, but don’t be surprised if we move your post to another forum to keep everything well-organised. We sometimes lightly edit your questions to make them clear and concise, so that they are useful for everyone.

Always use your University login and an approved secure platform for communication: Microsoft Teams, Blackboard Collaborate, Zoom, gather.town, and your University-provided email.

Assessment

There are two of them are practical assignments and you will be assessed on those by writing a lab report for each of them. We will also have 3 online tests through the semester (instead of one big exam at the end).

Online Quizzes

  • 30% of final grade
  • Multiple choice questions: Test open for 2 days, but once you start you need to complete within 1 hour on Learn.
  • Due dates:
    • Phon/Signals (10%): Monday 2023-10-16 12.00 noon – Thursday 2023-10-18 12.00 noon
    • TTS (10%): Monday 2023-10-30 12.00 noon – Wednesday 2023-11-01 12.00 noon
    • ASR (10%): Monday 2023-11-27 12.00 noon – Wednesday 2023-11-29 12.00 noon

Lab Assignment 1

  • 30% of final grade
  • Word limit: 1500 words
  • Due date: Monday 2023-11-06 12.00 noon

Lab Assignment 2

  • 40% of final grade
  • Word limit: 3000 words
  • Due date: Thursday 2023-12-07 12.00 noon

There is no final exam for this course.

Attendance

There are no marks attached to attendance at the timetabled labs or lectures. But, of course, you’ll get a better mark on each item of assessment if you learn more – and the way to learn is to use the materials and sessions providing. Our experience is that working with other people in the class is beneficial for students. But everyone is different and if you learn better on your own that’s completely fine! What we would ask is that if you feel stuck, that you ask someone for help!