The slides from my (Simon’s) keynote are now online under Courses > One-off events. I’ll try to add a recording and perhaps a bibliography later.
Entropy: understanding the equation
data:image/s3,"s3://crabby-images/f487d/f487d053bc977477011a99a73f6a986aa735e915" alt=""
The equation for entropy is very often presented in textbooks without much explanation, other than to say it has the desired properties. Here, I attempt an informal derivation of the equation starting from uniform probability distributions. A good way to think about information is in terms of sending messages. In the video, we send messages […]
Continue reading...Autocorrelation for estimating F0
data:image/s3,"s3://crabby-images/a4c7d/a4c7d1a3b68cf2616290f3897f3c0118ab819828" alt="Autocorrelation"
Most methods for estimating F0 start from autocorrelation. The idea is pretty simple: we are just looking for a repeating pattern in the waveform, which corresponds to the periodic vocal fold activity. For some waveforms, it might be possible to do that directly in the time domain, but in general that doesn’t work very well. […]
Continue reading...Interactive unit selection
data:image/s3,"s3://crabby-images/e3c3f/e3c3f153481f61320cad5037894327921673be9d" alt=""
Just a toy demo, but should give you some idea of how unit selection waveform generation works. Click with your mouse to choose a candidate diphone from each column, then the corresponding synthesised waveform will appear. You can click on the synthesised waveform to hear it again. Try to obtain the most natural-sounding synthesis by […]
Continue reading...Token passing
data:image/s3,"s3://crabby-images/6e8bc/6e8bce95c8558d305dfc1c5418cc2e4664c0dfe5" alt=""
Token passing is a really nice way to understand (and even to implement) Viterbi search for Hidden Markov Models. Here we see token passing in action, and you can look at the spreadsheet to see the calculations. To keep things simple, we are ignoring transition probabilities in this example. It would be simple to add them […]
Continue reading...Bitrate
data:image/s3,"s3://crabby-images/9d87c/9d87c4998e22a7bc8957cd0408f6cfa2fca0bc22" alt=""
The bitrate (or bit rate) of a signal is the number of bits required to store, or transmit, 1 s of that signal. A bit is a binary number: either 0 or 1. Let’s calculate the bitrate of a digital waveform. First you should revise the concepts of sampling and quantisation from this module of the […]
Continue reading...Pipeline architecture for TTS
data:image/s3,"s3://crabby-images/5d748/5d7489b27a07d7130174febfa73c8d4fdb1b1f5d" alt="Pipeline architecture"
Most text-to-speech systems split the problem into two main stages. The first stage is called the front end and contains many separate processes which gradually build up a linguistic specification from the input text. The second stage typically uses language-independent techniques (although they still require a language-specific speech corpus) to generate a waveform. Here we see those two […]
Continue reading...The speed of sound
data:image/s3,"s3://crabby-images/2090f/2090fe88e1e2354d1b58d36e3310090bed14d779" alt=""
At the Parque de las Ciencias in Granada, Spain there is this long tube, open at the end nearest you and closed at the far end. We can calculate the length of this tube just from the audio recording, because we know the speed of sound. Here’s the waveform of part of the recording, showing […]
Continue reading...A super-simple speech recogniser
data:image/s3,"s3://crabby-images/b14ca/b14ca9b65e4bf9526d1efeddb5afc8c551c04a22" alt=""
We make what is possibly the world’s simplest speech recognition system. It can only recognise two different words, but will help you understand the basic idea of pattern recognition using template matching. The templates are just pre-recorded words, with known labels. The features extracted are just two formant frequencies in the middle of the word, […]
Continue reading...Sampling and quantisation
data:image/s3,"s3://crabby-images/603d6/603d6d192457367589141f217b31e411e2e8a9be" alt=""
Is digital better than analogue? Here we discover that there are limitations when storing waveforms digitally. We learn that the consequence of sampling at a fixed rate is an upper limit on the frequencies that can be represented, called the Nyquist frequency. In addition to the limitations of sampling, storing each sample of the waveform as a […]
Continue reading...My inaugural lecture
data:image/s3,"s3://crabby-images/e2af4/e2af49d23349a1260d0cac76f511a699e455cda6" alt=""
I talk about how speech synthesis works, in what I hope is a non-technical and accessible way, and finish off with an application of speech synthesis that gives personalised voices to people who are losing the ability to speak. I also try to mention bicycles as many times as possible. For a more up-to-date, slightly more technical, […]
Continue reading...Classification and regression trees (CART)
data:image/s3,"s3://crabby-images/b92b3/b92b359162b5caf67d31fe8f4c7d72d4b275d6c0" alt=""
A quick introduction to a very simple but widely-applicable model that can perform classification (predicting a discrete label) or regression (predicting a continuous value). The tree is learned from labelled data, using supervised learning. Before watching this video, you might want to check that you understand what Entropy is.
Continue reading...