- Getting startedExamining sounds with Wavesurfer. Inspecting the same signal in both the frequency and time domains. Various types of sound, including speech.
Setting up
Throughout all the exercises, we’ll assume you are using a Linux computer. If you have not used a Linux computer before, please spend some time familiarising yourself with this operating system.
You also need to be comfortable with the bash shell. Again, learn some of the useful keyboard shortcuts (Wikipedia has a list of these).
Some commands in these exercises have to be typed at the shell prompt in a terminal window. The terminal is just the graphical part of the interface (i.e., the window). Inside the terminal is the shell, which is the program that you are interacting with. Things that you should type at the shell prompt are shown like this:
$
You don’t need to type the prompt itself (and note that your prompt may look different to the one above); just type the command. The dollar-sign prompt is used to distinguish shell commands from other commands used later on in other exercises, which are not typed into a shell but into other programs (e.g., Festival).
Download the files
Download the materials in this zip file. Your computer might unzip the file automatically; if not, double-click it to unzip it. It will create a folder called
lab1
.We need to make a directory to work in, inside your
Documents
folder, and let’s do that on the command line. To open a terminal, move your mouse to the top left corner so that a menu bar appears, and select the terminal symbol. In the terminal, type:$ cd ~/Documents $ mkdir sp $ cd sp $ gio open .
A folder window will open, showing the newly-created
sp
folder. Copy-pastelab1
into it using either the mouse commands or keyboard shortcutsctrl-c
andctrl-v
. Notice how we can do the same things on the command line, or using the mouse. Mastering both of these is important in order to work efficiently.In the Linux terminal you need to use
ctrl-shift-c
to copy text andctrl-shift-v
to paste text (not justctrl-c
andctrl-v
).Examine some audio files
Start wavesurfer; one way is to do that from a shell:
$ wavesurfer
and another way is find it in using the application search by moving your mouse to the top left corner and typing “wavesurfer” in the search box that says “Type to search…” There is often more than one way to do the same thing! Learn them all, and try to minimise the use of the mouse.
Load the example file
sine_100Hz.wav
from yoursp/lab1
folder using the open command on wavesurfer’s file menu.Wavesufer will ask you how you wish to display the file. For the moment just select
waveform
. If wavesurfer displays a pop-up error message when opening a file, you can ignore it.You should see two version of the waveform on the screen. A big version at the top, which is the main display window, and a smaller version below it. This small version shows you which portion of the file is actually displayed in the top part of the window.
You can zoom in and out using the magnifying glass icons at the top of the window or by selecting a region by dragging the mouse over some of the waveform and selecting
zoom to selection
from theview
menu (note the keyboard shortcuts on the menu for zooming).To play the current selection, press the space-bar.
To see a spectrogram for a given file: Right click on the main waveform and select
create pane
and then selectspectrogram
. You can alter the detail displayed by right-clicking on the spectrogram and bringing up thespectrogram controls...
To see a spectrum, select a portion of the waveform and then right-click and select
spectrum section...
For each of the non-speech waveforms:
- What are the differences between sine_100Hz.wav, sine_200Hz.wav and sine_300Hz.wav ?
- measure the time between 2 adjacent peaks in the waveform and calculate their fundamental frequency
- measure the amplitude
- Does the spectrum of a sine wave vary over time?
- What about the square and pulse waveforms? How do they differ from the sine waves?
- What is the relationship between the various waveforms? Are some more complex than others?
In the spectrum, the vertical axis is a logarithmic scale by default. So, we only need to focus on the peaks in the spectrum – this is where almost all the energy is. In the plot below, those peaks are at around 20dB. Don’t worry about the very low energy parts of the spectrum (between 60dB and 70dB in the plot below) – these are just artefacts of the analysis method.
You can vary the frequency resolution of the spectrum by changing the
FFT points
setting: this controls how many samples from the waveform are analysed. Analysing more samples provides more frequency resolution.For the speech waveform:
- Can you segment it into regions; how about using the spectrogram instead of the waveform?
- What patterns are emerging, and how does the waveform shape correspond to what you see on the spectrogram?
- Make spectra of different regions of the speech waveform. Use the snapshot button on the spectrum window to save a reference spectrum for comparison.
- For a single region (choose a vowel), try different analysis window lengths when computing the spectrum. What do you notice about the resolution of the resulting spectrum? Do the same thing in the spectrogram.
Some parts of speech signals have similar properties to the pulse train signal the you saw earlier. Here’s an example:
See how there is a similar “line structure”, at least in the lower frequencies. This tells us that there is something in common between speech and the simple pulse train. But there are differences of course. In the pulse train, all the lines (which are the harmonics) had the same height (which is the
amplitude
). In the speech signal, there is a more interesting shape (we call this thespectral envelope
)What does speech have in common with the pulse train? What causes speech to have a different spectral envelope to the pulse train?
There are some other parts of speech signals that don’t seem to have any harmonics. Here’s one example of this:
When does speech have no harmonics, and what does that tell us about how it was produced?
- What are the differences between sine_100Hz.wav, sine_200Hz.wav and sine_300Hz.wav ?
- Analysing sound in PraatPraat is more powerful than Wavesurfer, and can perform various useful and interesting manipulations of speech.
Setting up
This lab uses the same files as the previous exercise. Go into the folder you created last time and check they are still there:
$ cd ~/Documents/sp/lab1 $ ls
If you don’t have the files, go back to the instructions for lab 1 and copy the files again.
Today we are going to use another speech analysis program to look at waveforms. This program is called Praat. Praat and Wavesurfer share many of the same functions, but sometimes one does particular things better than the other; Praat does a lot of things that Wavesurfer can’t.
Start Praat
$ praat
Praat will open two windows on the screen, one called praat objects and one called praat picture. You can immediately close the praat picture window: it is not needed for this exercise.
The main Praat window shows a list of loaded objects (currently an empty list), and a list of action buttons that can be used to manipulate objects.
To load a file into Praat click on the
Open
menu and selectread from file
. Load in the filekdt_001.wav
. You should now have an object in the object list calledSound kdt_001
. Click on the play button to play it.Examine speech
If you want to view the waveform click the
View & Edit
button. This opens a window containing the speech and an f0 trace.Work out how to zoom in and out, like you did with Wavesurfer. Notice that you can play selected parts of the waveform by dragging to select and then clicking on the ‘button’ that appears under the selection. You can add a spectrogram, by selecting the spectrum menu and clicking
show spectrogram
.You can generate a spectrum of a portion of speech by first dragging to select some waveform and then clicking on the
spectrum
menu and selectingview spectral slice
.Praat lets you do some interesting things with the spectrum. You can select a range of frequencies by dragging the mouse over part of the spectral slice, and then play just those frequencies. Try it by selecting a large portion of the speech file, creating a spectral slice, selecting different frequency ranges and playing them.
Analyse various signals
- Load one of the square waves, select a section from the middle of the waveform about 1 second in length, and generate a spectral slice of this portion.
You should see a spectrum showing the component frequencies for the waveform. Try playing individual peaks or groups of peaks. Compare the pitch and timbre of your selections.
Try the same thing, playing back just a range of frequencies, but this time use the sine or square waveforms:
- can you create a sound like a square wave, starting from a sine wave?
- can you create a sound like a sine wave, starting from a square wave?
- Load the file
sweep.wav
.- Examine spectral slices at different points in the file.
- In the praat objects window select the sound sweep object and click on the
filter
button and selectfilter one formant
. In the box that opens, set a frequency of 2500Hz and a bandwidth of 300Hz, and click OK. - You should now have a new object in the list called
Sound sweep_filt
. Compare the waveform and spectrum of this object to the original object.
- Try the above filtering process on the speech waveform, with filter frequencies in the typical range of speech formants, using narrow bandwidths of about 50Hz.
- Load one of the square waves, select a section from the middle of the waveform about 1 second in length, and generate a spectral slice of this portion.
- Fun with TD-PSOLAFor diphone synthesis, systems like Festival need to manipulate the fundamental frequency and duration of recorded speech. TDPSOLA is a popular way to do that.
In diphone speech synthesis, the fundamental frequency and duration of the diphones are manipulated to match the values predicted from text. Praat allows us to experiment with one common technique that is used to manipulate fundamental frequency and duration for speech synthesis, called TD-PSOLA.
Start Praat and load in any natural speech waveform. Select the object from the object list and click on
Manipulate-
thenTo Manipulation...
, and click OK on the pop-up window that appears.Select the new object and then click
Edit
. You should get a window showing the waveform and the pitch contour.What are the vertical blue lines overlaid on the waveform? Zoom in to find out.
In the new window select
Pitch / Stylise pitch (2 st)
. The pitch contour should become a few points joined by lines.Play the waveform, and then drag the points around and play the waveform again. Repeat until bored…
Try to make the sentence sound like a question. Try to place the emphasis on different words.
If you really want to get clever, add a number of duration points to the duration tier by selecting
Dur
and thenAdd duration point at cursor
; move the cursor and repeat a few times. Move a few of these points around and play the file.When you make extreme changes to pitch and duration, can you hear any signal processing artefacts?
- ExtrasSome suggestions for going a little bit further with this exercise, if you want to.
- Record your own speech and analyse that (instructions on recording speech can be found in the digit recogniser exercise.)
- record on different devices (your phone, your laptop, …) and compare the signals to see what differs – compare by listening, looking at the waveform, and inspecting the spectrogram. Can you figure out the cause of these differences?
- record close to the microphone, or far away (try it in a reverberant space, such as the bathroom!) – what changes? why?
- Analyse some of the synthetic speech signals in this zip file
- what are the differences between them? use different tools (your ears, the waveform, the spectrogram)
- in what ways are they similar or different to natural speech?
- could you tell they are synthetic just by looking at the spectrogram?
Familiarisation
In these simple exercises, we get our hands on speech and other audio signals, and analyse them in various ways. We use the Wavesurfer and Praat tools.
Log in