Module 0 – getting started

Start here! Gives an introduction to the course, explains how the course is delivered, and describes the computing environment you will need.
Log in

Welcome to the course! This is Module 0. It doesn’t have any course content, but introduces you to the course and tells you what is covered. It explains how the course is delivered and what you need to do to prepare for taking the course.

What the course covers

What is speech processing? Here, we mean the two main applications of speech technology: Speech Synthesis (the generation of speech from text input) and Automatic Speech Recognition (the transcription of speech into text). That’s going to involve processing speech (and sometimes text) in various ways. To do Speech Synthesis, we construct a new speech signal to say out loud any text that is input to the system; that will involve predicting which phonemes are required to pronounce the words in that text. To do Automatic Speech Recognition, we extract salient features from the input speech and use machine learning to classify sequences of these features into a sequence of words, each of which is made of a sequence of phonemes. Already, you can see that we need to bring together phonetics, signal processing and machine learning to solve these problems.

The phonetics modules in this course are intended to complement the speech processing content. This course will not provide a comprehensive background in phonetics, but it will get you started thinking about speech using the terms that phoneticians use. It will provide you with some foundational concepts and tools to relate speech processing to phonetic descriptions and annotations of speech.

The course will cover relevant theory: speech as a signal that carries information, how speech is produced by a talker and perceived by a listener, signal processing, and probabilistic modelling. We’ll need to underpin everything with a foundation course in phonetics. Since this is an interdisciplinary area, we’ll need to learn a diverse set of practical skills too, including maths, efficient use of software tools, and communicating our understanding through scientific writing.

How the course is organized and delivered

Modules

Speech Processing has 10 content modules, numbered 1 to 10 and each designed to take 1 week of study. Within each module you will find videos, readings, instructions for the labs, and self-test quizzes (for most modules).  To get the most out of the course, you should try to watch the module videos before the Thursday lecture (09.00-10.50). The lab for each module will be on the Wednesday in the following week (09.00-10.50 or 16.10-18.00).

Even if you don’t manage to watch the videos or do the readings beforehand, you should still come to class! 

Finding the course content

Almost everything you need is here on speech.zone and we have tried to make it as self-contained as possible. However, you will need to access some other resources as follows:

  • Some readings are only available through the library.
  • The University of Edinburgh Learn site will be used to: send class-wide messages; find live lecture recordings
  • Learn will also be used for online tests and assignment submissions

Navigating the course content

The course hub is the place to start. There you’ll find the 10 content modules along with a weekly schedule reading lists, milestones to keep you on track with the assignments, and the marking policy.

Each module has a similar structure. On each module page you’ll see a sequence of tabs, which you should work through in order from left to right. Everything for a module is on a single page, but clicking   will open that part in a new browser tab or window. For videos, that allows you to see the transcript.

Lectures

Lectures will be held online on Thursday 9:00-10:50. In 2024, these will be live, in-person lectures held in 7GS_F.21, 7 George Square.

Labs

Labs start in week 2. You can attend either of the sessions on Wednesdays 9:00-10:50 and 16:10-18:00 which will both be in AT_4.02, the PPLS Computer Lab, Appleton Tower.

The labs are self-paced: you should go through the lab materials and ask for help from the tutors as you need it. We’d also encourage you to work through the materials with your classmates. Generally, you should try to watch all the videos and complete the essential readings before attempting the lab exercises, but even if you can’t do this you should still come to your lab session. You can think of it as designated time you can devote to this course where help will be around.

The phonetics and signals labs really aren’t about coding (though the signals labs will involve some code for you to run and play with), but rather to get you to think through the concepts introduced in the lecture videos. For notebooks that have python code included, you don’t need to understand all the workings of the code and you definitely won’t need to reproduce it. However, we do want you to get some exposure to how these abstract concepts get translated into concrete programming (and do something a bit interactive with the computer and each other!).

The TTS and ASR labs will be focused on the TTS and ASR assignments. So, they will have more of a potential coding component (generally bash shell scripting).  You can still do the assignments even if you don’t have a programming background, though some familiarity with coding basics will certainly make life easier.

Discussion forums on speech.zone

Accounts on speech.zone: enrolled students will be given an account here on speech.zone, which allows you to access all forums and to post on the forums. It is therefore essential to be formally enrolled for this course as soon as possible because you won’t appear on the student list until you are, and so we cannot create a speech.zone account for you. If you encounter delays in enrollment, email the course organizer (contact details in the DRPS entry for the course).

Speech zone accounts will be made for everyone who is enrolled by the end of week 1.

To activate your account use the ‘forgot password’ link in the login box to set a password. Your username is your University of Edinburgh UUN. Please let the course organizer know if you enrolled after that date.

Please familiarise yourself with the forum structure – there is a lot of useful information there already. A few forums are hidden from public view, so you need to be logged in to find them (and the forum search facility won’t return results from them unless you are logged in).

Try to post in the most appropriate place, but don’t be surprised if we move your post to another forum to keep everything well-organised. We sometimes lightly edit your questions to make them clear and concise, so that they are useful for everyone.

Always use your University login and an approved secure platform for communication: Microsoft Teams, Blackboard Collaborate, Zoom, and your University-provided email.

Assessment

There are two of them are practical assignments and you will be assessed on those by writing a lab report for each of them. We will also have 3 online tests through the semester (instead of one big exam at the end).

Online Quizzes

  • 30% of final grade
  • Multiple choice questions: Test open for 2 days, but once you start you need to complete within 1 hour on Learn.
  • Due dates:
    • Phon/Signals (10%): Monday 2024-10-14 12.00 noon – Wednesday 2024-10-16 12.00 noon
    • TTS (10%): Monday 2024-10-28 12.00 noon – Wednesday 2024-10-30 12.00 noon
    • ASR (10%): Monday 2024-11-25 12.00 noon – Wednesday 2024-11-27 12.00 noon

Lab Assignment 1

  • 30% of final grade
  • Word limit: 1500 words
  • Due date: Monday 2024-11-04 12.00 noon

Lab Assignment 2

  • 40% of final grade
  • Word limit: 3000 words
  • Due date: Thursday 2024-12-05 12.00 noon

There is no final exam for this course.

Attendance

There are no marks attached to attendance at the timetabled labs or lectures. But, of course, you’ll get a better mark on each item of assessment if you learn more – and the way to learn is to use the materials and sessions provided in class. Our experience is that working with other people in the class is also beneficial for students. But everyone is different and if you learn better on your own that’s completely fine! What we would ask is that if you feel stuck, that you ask someone for help!

Lectures

Lectures will be in-person.  Lecture recordings will be done through the University system (Media Hopper/echo365) and available after the class.  We may also use some polling software in class, e.g. wooclap.

Labs

From week 2 onwards you will have a timetabled 2 hour lab where you can work through exercises that will consolidate your understanding of the materials from the preceding week. The different modules have different software requirements which are listed below. All of the software should be available on the computers in the AT PPLS computing lab.  Once you have an account on speech.zone, you can use the technical support forums to get help.

Phon labs: Praat

The first few labs will focus on phonetics using Praat. You can download Praat onto your own computer: it should work for MacOS, Windows and Linux.

Signals labs: Jupyter Notebooks

The signal processing labs will use Jupyter notebooks: a combination of Python code and notes that you access using a web browser. Don’t worry if you don’t know any Python – this is not a formal requirement of the course, and you’ll learn what you need simply by doing the exercises. You can run the notebooks on your personal computer or on the Edina Notable service (direct login through Learn)

More details for getting setup with these can be found in the Module 3 materials.

Please note, while the notebooks use Python, the material there is to support your learning, not to test your ability to code! Nothing related to the code specifically is directly assessed (though the concepts in the notebooks marked essential maybe). You don’t need to know Python to do this course, though MSc SLP students will need to know Python for many other courses so getting a more bit of practice/exposure definitely doesn’t hurt.

TTS and ASR labs/assignments: PPLS AT lab linux machines/Remote Desktop

The TTS and ASR labs will be focused on the TTS and ASR assignments, respectively. The TTS assignment uses the Festival Speech Synthesis System, while the ASR assignment uses HTK and an assortment of Bash (unix) shell scripts. So, for this part of the course you will need to use a unix command line interface, i.e. the shell.

Students will have access to this via the  PPLS AT labs, physically and through the remote desktop service.

You can potentially install Festival and HTK onto your own computer, but if your internet connection is good, it may be easier to use the Remote Desktop service (or just go to the lab). This basically allows you to use the computers in the Appleton tower PPLS computer lab from your own computer.

Setup Remote Desktop access to the PPLS Computer Lab

  • Ensure you are first connected to a University VPN
  • Download and install the XRDP software on your machine
    • You can find instructions for installing XRDP at the end of this PPLS computing lab page and or the Informatics Computing help pages
    • Note: This is the same software the Informatics uses for DICE remote desktop, but please note the servers we use for this course are not DICE ones.  You need to use the connection instructions here (for PPLS servers) to access the right data!
  • Go to https://resource.ppls.ed.ac.uk/whoson/atlab.php and follow the instructions at the bottom of that page to connect to the remote desktop
  • Log in with your normal school/EASE log in details.
    • If you find your normal EASE log in details do not work, then you will need to ask to have your account activated for the PPLS Lab machines. Mail is.helpline@ed.ac.uk to request this (e.g. “please can you enable my account to log in to the PPLS Computer Lab machines, my UUN is sXXXXXXX…”).
  • Make sure you log out and disconnect properly once you have finished, otherwise you may tie the machine up and prevent others from using it!

When you become comfortable using Unix style command line interfaces, you may want to use the ssh command connect directly into these servers. It’s a good skill to have and takes up a lot less computing power than running the remote desktop. If you want to try this, you can follow the instructions on remote working here.

For Linux and MacOS users, this means the Terminal app that already exists on your computer (or equivalent). For Windows users, you may want to try VSCode with git-bash extensions.  VSCode is also a good environment for coding in Linux and MacOS.  It’s widely used in industry, so if you’re going to go on to do more coding, it’s worth giving it a go (just give yourself a bit of time to get setup – there are a lot of extensions!).   Otherwise, for Windows, you can use a terminal emulator, e.g. Windows Powershell or Ubuntu WSL. You can also use an ssh client – the classic one of these for Windows is PuTTY. You might notice that pretty much all of the staff use MacOS or Linux – once you get into programming a bit you’ll understand why!

The online LinkedIn Learning course “Learning Linux Command Line” (free with your University login) also gives some instructions on setting up a linux-style environment on different operating systems (e.g. MacOs, Windows).

If you’re an experienced Linux user, you can try to install Festival and HTK locally. You may be able to find some hints already on the speech zone forums if you wish to attempt this. You can also ask us for help! If you do this, you will still need to access the PPLS servers to download the data and scripts used in the assignments.

If none of these solutions work for you, please get in touch and we will work something else out!

Use of Generative AI in the Course

This is our official policy on use of AI based tools:

Academic integrity is an underlying principle of research and academic practice. All submitted work is expected to be your own. AI tools (e.g., ChatGPT, ELM) should not be used to generate written text on your assignments in Speech Processing. However, you are allowed to use these tools to identify ideas, key themes, and plan your assessment. You may also use it to improve the clarity of your writing. If you use AI software, you must acknowledge its use in your submission. 

We will give more details on how you should declare any use of AI in the assignment instructions.  From the perspective of researchers actively reviewing work in the speech technology and speech science, we can tell you one thing right now: using tools like ChatGPT may make your writing seem more scholarly to someone not in the field, but it often does more harm than good.  ChatGPT often provides text that seems reasonable unless you actually understand the field.  If you don’t actually have the mastery of the material, you may find your newly fluent text is full of subtle errors that make your marker very unhappy.  We’ve also been seeing what can only be described as pseudo-academic writing – it’s what someone outside the field might imagine academics like to read but it really isn’t!  This also makes your markers unhappy!

Whether or not you use tools to help you in your writing, you always need to critically think about whether what you’re saying makes sense, is backed up by evidence, and forms a coherent argument.

 

 

Meet the team delivering Speech Processing – click on each of them to find out more including contact details and how to make an appointment.

Let’s see if you’ve read everything in the previous tabs carefully and understand how this course works!

Since this is the first time, let’s explain the two types of quiz. The first type involves a set of flashcards. Read the question shown on the card, then write down your answer on paper (don’t just think it in your head – commit it to paper to keep yourself honest!). Then click anywhere on the flashcard to flip it over and reveal the answer. Try it now:


The second type of quiz is a set of multiple choice questions. Select your answer for each of the questions, then you should just reveal your score. If you didn’t get them all right, hide your score and go back and reconsider your answers. Then you can mark your individual answers as correct or incorrect, and go back and reconsider only those you got wrong. Finally, you can reveal all the answers. To get the most out of the quizzes, don’t reveal all the answers until you’ve tried your best to answer them.


How many content modules are there in the course?




How many labs will you attend each week?




What is the name of the program we'll use to in the phon labs?




Your score is out of
Clear your answers and try again

These are simple quizzes that just run in your browser. You don’t need to be logged in. Even if you are logged in, your score is not saved and no-one else will see it. The quizzes are to help you learn – they are not part of the assessment for the course.

Because Speech Processing is an interdisciplinary area, it’s possible that you already know something about one part of it – perhaps you have already done some phonetics, or have a strong mathematical background, or are a good programmer.

To help you navigate, the course material is divided as follows:

  • PHON – phonetics and phonology
  • SIGNALS – signal processing, with a focus on speech signals
  • TTS – text-to-speech synthesis
  • ASR – automatic speech recognition
  • SKILLS – maths, computing, writing

The SKILLS content is a collection of essential skills that you need to master, in order to complete this course and to continue working in this area afterwards. You may want to check out our preparation guides, especially if you haven’t done any maths for a while or are new to programming.

We’ll eventually use the Linux command line to do the TTS and ASR assignments. If you’ve not used this before, and want to make a start on learning, we recommend you do the exercises in the LinkedIn Learning course “Learning Linux Command Line” (free with your University login). This course also gives some instructions on setting up a linux-style environment on different operating systems (e.g. MacOs, Windows).  For a more compact intro to the bash terminal commands, check out this nice tutorial on youtube.

All students are strongly recommended to study all the materials, even if you think you know it already. Many of you will come in with more linguistics (particularly phonetics) knowledge, whereas some of you will have more of computer science background. Our experience tells us that sharing your knowledge with other students is an effective strategy for improving your own learning, so we strongly encourage you to talk to your classmates!

Any questions about how the course is delivered? Ask on the forums!

Forums Courses Speech Processing Module 0 – getting started Speech Processing – course delivery

Viewing 13 topics - 1 through 13 (of 13 total)
Viewing 13 topics - 1 through 13 (of 13 total)
  • You must be logged in to create new topics.