Module 0 – getting started

Start here! Gives an introduction to the course, explains how the course is delivered, and describes the computing environment you will need.

Start Computing Requirements Quiz Team/Contact Details Finish

Welcome to the course! This is Module 0. It doesn’t have any course content, but introduces you to the course and tells you what is covered. It explains how the course is delivered and what you need to do to prepare for taking the course.

What the course covers

What is speech processing? Here, we mean the two main applications of speech technology: Speech Synthesis (the generation of speech from text input) and Automatic Speech Recognition (the transcription of speech into text). That’s going to involve processing speech (and sometimes text) in various ways. To do Speech Synthesis, we construct a new speech signal to say out loud any text that is input to the system; that will involve predicting which phonemes are required to pronounce the words in that text. To do Automatic Speech Recognition, we extract salient features from the input speech and use machine learning to classify sequences of these features into a sequence of words, each of which is made of a sequence of phonemes. Already, you can see that we need to bring together phonetics, signal processing and machine learning to solve these problems.

The phonetics modules in this course are intended to complement the speech processing content. This course will not provide a comprehensive background in phonetics, but it will get you started thinking about speech using the terms that phoneticians use. It will provide you with some foundational concepts and tools to relate speech processing to phonetic descriptions and annotations of speech.

The course will cover relevant theory: speech as a signal that carries information, how speech is produced by a talker and perceived by a listener, signal processing, and probabilistic modelling. We’ll need to underpin everything with a foundation course in phonetics. Since this is an interdisciplinary area, we’ll need to learn a diverse set of practical skills too, including maths, efficient use of software tools, and communicating our understanding through scientific writing.

How the course is organized and delivered

Modules

Speech Processing has 10 content modules, numbered 1 to 10 and each designed to take 1 week of study. Within each module you will find videos, readings, instructions for the labs, and self-test quizzes (for most modules). The course will follow a flipped classroom format that. That means you should try to watch the module videos before the Thursday lecture (09.00-10.50). The lab for each module will be on the Wednesday in the following week (09.00-10.50 or 16.10-18.00).

Finding the course content

Almost everything you need is here on speech.zone and we have tried to make it as self-contained as possible. However, you will need to access some other resources as follows:

Some readings are only available through the library.
Blackboard Learn will be used to: send class-wide messages; find live lecture recordings
Blackboard Learn will be used for online tests and assignment submissions

Navigating the course content

The course hub is the place to start. There you’ll find the 10 content modules along with a weekly schedule reading lists, milestones to keep you on track with the assignments, and the marking policy.

Each module has a similar structure. On each module page you’ll see a sequence of tabs, which you should work through in order from left to right. Everything for a module is on a single page, but clicking will open that part in a new browser tab or window. For videos, that allows you to see the transcript.

Lectures

Lectures will be held online on Thursday 9:00-10:50. In 2022, these will be live, in-person lectures held in 7GS_F.21, 7 George Square.

Labs

Labs start in week 2. You can attend either of the sessions on Wednesdays 9:00-10:50 and 16:10-18:00 which will both be in AT_4.02, the PPLS Computer Lab, Appleton Tower.

The labs are self-paced: you should go through the lab materials and ask for help from the tutors as you need it. We’d also encourage you to work through the materials with your classmates. Generally, you should try to watch all the videos and complete the essential readings before attempting the lab exercises, but even if you can’t do this you should still come to your lab session. You can think of it as some time you can devote to this course where help will be around.

The phonetics and signals labs really aren’t about coding (though the signals labs will involve some code for you to run and play with), but rather to get you to think through the concepts introduced in the lecture videos. For notebooks that have python code included, you don’t need to understand all the workings of the code and you definitely won’t need to reproduce it. However, we do want you to get some exposure to how these abstract concepts get translated into concrete programming (and do something a bit interactive with the computer and each other!).

The TTS and ASR labs will be focused on the TTS and ASR assignments. So, they will have more of a potential coding component (generally bash shell scripting), but you can still do the assignments even if you don’t have a programming background.

Discussion forums

Accounts on speech.zone: enrolled students will be given an account here on speech.zone, which allows you to access all forums and to post on the forums. It is therefore essential to be formally enrolled for this course as soon as possible because you won’t appear on the student list until you are, and so we cannot create a speech.zone account for you. If you encounter delays in enrollment, email the course organizer (contact details in the DRPS entry for the course).

Speech zone accounts will be made for everyone who is enrolled at the beginning of week 1 (accounts should be made by the end of week 1 – sorry for the delay!).

To activate your account use the ‘forgot password’ link in the login box to set a password. Your username is your University of Edinburgh UUN. Please let the course organizer know if you enrolled after that date.

Please familiarise yourself with the forum structure – there is a lot of useful information there already. A few forums are hidden from public view, so you need to be logged in to find them (and the forum search facility won’t return results from them unless you are logged in).

Try to post in the most appropriate place, but don’t be surprised if we move your post to another forum to keep everything well-organised. We sometimes lightly edit your questions to make them clear and concise, so that they are useful for everyone.

Always use your University login and an approved secure platform for communication: Microsoft Teams, Blackboard Collaborate, Zoom, gather.town, and your University-provided email.

Assessment

There are two of them are practical assignments and you will be assessed on those by writing a lab report for each of them. We will also have 3 online tests through the semester (instead of one big exam at the end).

Online Quizzes

30% of final grade
Multiple choice questions: Test open for 2 days, but once you start you need to complete within 1 hour on Learn.
Due dates:
- Phon/Signals (10%): Monday 2023-10-16 12.00 noon – Thursday 2023-10-18 12.00 noon
- TTS (10%): Monday 2023-10-30 12.00 noon – Wednesday 2023-11-01 12.00 noon
- ASR (10%): Monday 2023-11-27 12.00 noon – Wednesday 2023-11-29 12.00 noon

Lab Assignment 1

30% of final grade
Word limit: 1500 words
Due date: Monday 2023-11-06 12.00 noon

Lab Assignment 2

40% of final grade
Word limit: 3000 words
Due date: Thursday 2023-12-07 12.00 noon

There is no final exam for this course.

Attendance

There are no marks attached to attendance at the timetabled labs or lectures. But, of course, you’ll get a better mark on each item of assessment if you learn more – and the way to learn is to use the materials and sessions providing. Our experience is that working with other people in the class is beneficial for students. But everyone is different and if you learn better on your own that’s completely fine! What we would ask is that if you feel stuck, that you ask someone for help!

Lectures

Lectures will be in-person this year and recordings will be done through the University system (Media Hopper/echo365). We may also use some polling software in class, e.g. wooclap.

Labs

From week 2 onwards you will have a timetabled 2 hour lab where you can work through exercises that will consolidate your understanding of the materials from the preceding week. The different modules have different software requirements which are listed below. Once you have an account on speech.zone, you can use the technical support forums to get help.

Phon labs: Praat

The first few labs will focus on phonetics using Praat. You can download Praat onto your own computer: it should work for MacOS, Windows and Linux.

Signals labs: Jupyter Notebooks

The signal processing labs will use Jupyter notebooks: a combination of Python code and notes that you access using a web browser. Don’t worry if you don’t know any Python – this is not a formal requirement of the course, and you’ll learn what you need simply by doing the exercises. You can run the notebooks on your personal computer or on the Edina Notable service (direct login here)

More details for getting setup with these can be found in the Module 3 materials.

Please note, while the notebooks use Python, the material there is to support your learning, not to test your ability to code! Nothing related to the code specifically is directly assessed (though the concepts in the notebooks marked essential maybe). You don’t need to know Python to do this course, though MSc SLP students will need to know Python for many other courses so getting a more bit of practice/exposure definitely doesn’t hurt!

TTS and ASR labs/assignments: PPLS AT lab linux machines/Remote Desktop

The TTS and ASR labs will be focused on the TTS and ASR assignments, respectively. The TTS assignment uses the Festival Speech Synthesis System, while the ASR assignment uses HTK and an assortment of Bash (unix) shell scripts. So, for this part of the course you will need to use a unix command line interface, i.e. the shell.

This year students will have access to the PPLS AT labs, physically and through the remote desktop service.

You can potentially install Festival and HTK onto your own computer, but if your internet connection is good, it may be easier to use the Remote Desktop service (or just go to the lab). This basically allows you to use the computers in the Appleton tower PPLS computer lab from your own computer.

Setup Remote Desktop access to the PPLS Computer Lab

Note: We aim to have access setup by end of week 1

Download and install the RealVNC Viewer software on your machine
- After downloading, you’ll see a message asking you to make a paid account to register etc – you can ignore this! You don’t need an account to use VNC Viewer.
Ensure you are first connected to a University VPN
Go to https://resource.ppls.ed.ac.uk/whoson/atlab.php and follow the instructions at the bottom of that page to connect to the remote desktop
If you run into any issues with messages saying the VNC subscription has expired, let the course organizer know.
Log in with your normal school/EASE log in details. If you find your normal EASE log in details do not work, then you will need to ask to have your account activated for the PPLS Lab machines. Mail is.helpline@ed.ac.uk to request this (e.g. “please can you enable my account to log in to the PPLS Computer Lab machines, my UUN is sXXXXXXX…”).
Make sure you log out and disconnect properly once you have finished, otherwise you may tie the machine up and prevent others from using it!

When you become comfortable using Unix style command line interfaces, you may want to use the ssh command connect directly into these servers. It’s a good skill to have and takes up a lot less computing power than running the remote desktop. If you want to try this, you can follow the instructions on remote working here.

For Linux and MacOS users, this means the Terminal app that already exists on your computer (or equivalent). For Windows users, you can (probably) use the ssh command from the windows commandline. Otherwise, you can use a terminal emulator, e.g. Windows Powershell or Ubuntu WSL. You can also use an ssh client – the classic one of these for Windows is PuTTY. You might notice that pretty much all of the staff use MacOS or Linux – once you get into programming a bit you’ll understand why!

The online LinkedIn Learning course “Learning Linux Command Line” (free with your University login) also gives some instructions on setting up a linux-style environment on different operating systems (e.g. MacOs, Windows).

If you’re an experienced Linux user, you can try to install Festival and HTK locally. You may be able to find some hints already on the speech zone forums if you wish to attempt this. You can also ask us for help! If you do this, you will still need to access the PPLS servers to download the data and scripts used in the assignments.

If none of these solutions work for you, please get in touch and we will work something else out!

Let’s see if you’ve read everything in the previous tabs carefully and understand how this course works!

Since this is the first time, let’s explain the two types of quiz. The first type involves a set of flashcards. Read the question shown on the card, then write down your answer on paper (don’t just think it in your head – commit it to paper to keep yourself honest!). Then click anywhere on the flashcard to flip it over and reveal the answer. Try it now:

Why should you only use a University-approved platform for communication?

Because they are secure. Other platforms may not protect your information or may require you to reveal personal details such as your mobile phone number.

The second type of quiz is a set of multiple choice questions. Select your answer for each of the questions, then you should just reveal your score. If you didn’t get them all right, hide your score and go back and reconsider your answers. Then you can mark your individual answers as correct or incorrect, and go back and reconsider only those you got wrong. Finally, you can reveal all the answers. To get the most out of the quizzes, don’t reveal all the answers until you’ve tried your best to answer them.

These are simple quizzes that just run in your browser. You don’t need to be logged in. Even if you are logged in, your score is not saved and no-one else will see it. The quizzes are to help you learn – they are not part of the assessment for the course.

Meet the team delivering Speech Processing – click on each of them to find out more including contact details and how to make an appointment.

Catherine Lai
Course Organizer and Lecturer
Rebekka Puderbaugh
Lecturer and Phonetics Expert
Simon King
Lecturer and content creator
Atli Sigurgeirsson
Tutor
Zihang Peng
Tutor

Because Speech Processing is an interdisciplinary area, it’s possible that you already know something about one part of it – perhaps you have already done some phonetics, or have a strong mathematical background, or are a good programmer.

To help you navigate, the course material is divided as follows:

PHON – phonetics and phonology
SIGNALS – signal processing, with a focus on speech signals
TTS – text-to-speech synthesis
ASR – automatic speech recognition
SKILLS – maths, computing, writing

The SKILLS content is a collection of essential skills that you need to master, in order to complete this course and to continue working in this area afterwards. You may want to check out our preparation guides, especially if you haven’t done any maths for a while or are new to programming.

We’ll eventually use the Linux command line to do the TTS and ASR assignments. If you’ve not used this before, and want to make a start on learning, we recommend you do the exercises in the LinkedIn Learning course “Learning Linux Command Line” (free with your University login). This course also gives some instructions on setting up a linux-style environment on different operating systems (e.g. MacOs, Windows).

All students are strongly recommended to study all the materials, even if you think you know it already. Many of you will come in with more linguistics (particularly phonetics) knowledge, whereas some of you will have more of computer science background. Our experience tells us that sharing your knowledge with other students is an effective strategy for improving your own learning, so we strongly encourage you to talk to your classmates!