Forum Replies Created
-
AuthorPosts
-
The form of filter you give only has terms involving x[.] on the right hand side (“RHS” in maths jargon). This is a Finite Impulse Response (FIR) filter, and you can explore that in one of the Module 2 Jupyter notebooks.
The equation operates in the time domain, and the co-efficients are simply weights applied to input (x) samples: the output is nothing more than a weighted sum of input samples.
Importantly, there is no term involving the output (y) on the RHS. This means there is no “feedback”. For any given input, the filter’s output will only continue for a finite time (the “F” in FIR) after the input stops.
In contrast, if we start putting some y[.] terms on the RHS, then there will be some feedback, and the filter can potentially produce output for an infinite duration after the input has ceased. This form of filter can exhibit resonance, and so is the form we will use to model the vocal tract filter.
You can explore Infinite Impulse Response (IIR) filters in one of the Module 2 Jupyter notebooks.
Try the notebooks, then post follow-up questions.
An analogue signal – such as a sound wave propagating through air – may contain frequencies over a very wide range, with no upper limit.
When we need a digital representation of such a signal, we need to choose a sampling rate (which then determines the Nyquist frequency). Our choice of sampling rate will be influenced by:
- What information in the sound we think is important – we might say that only frequencies up to 8 kHz are useful for Automatic Speech Recognition, for example.
- Practical considerations such as the amount of storage the digital waveform will require (higher sampling rate = larger files) or whether we need to transmit it (higher sampling rate = larger bandwidth required).
We will generally choose the lowest possible sampling rate that satisfies the first requirement, related to the application we are building.
We must remove any components of the analogue signal that are above the Nyquist frequency, before sampling it. This is done in the analogue domain using a low-pass filter (an ‘anti-aliasing filter’). There is such a filter in your computer’s audio input, for example.
You’re unlikely to ever need to build an analogue-to-digital convertor, so you might be wondering why we care about this…
The same thing applies when reducing the sampling rate of an existing digital signal – a process known as downsampling. For example, to halve the sampling rate, we cannot simply take every second sample. We must first pass the digital signal through a low-pass filter (an ‘anti-aliasing filter’ in the digital domain) to remove everything above the new, lower, Nyquist frequency.
Downsampling is quite common when preparing existing speech recordings for use in speech technology. They may have been recorded at a higher sampling rate than we wish to use.
If you get an error message, include it here. If there is no error message, then something else is wrong. Here’s what it looks like when SayText runs correctly:
festival> (SayText "Hello world.") #<Utterance 0x7f17db4840f0> festival>
and here’s what some errors might look like:
festival> SayText "Hello world." #<CLOSURE (text) (begin "(SayText TEXT) TEXT, a string, is rendered as speech." (utt.play (utt.synth (eval (list (quote Utterance) (quote Text) text)))))> "Hello world." festival> (SayText "Hello world." > festival> (SayText Hello world.) SIOD ERROR: unbound variable : Hello festival> SayText("Hello world.") #<CLOSURE (text) (begin "(SayText TEXT) TEXT, a string, is rendered as speech." (utt.play (utt.synth (eval (list (quote Utterance) (quote Text) text)))))> SIOD ERROR: bad function : "Hello world."Yes, that’s right. In that case, the waveforms will look different, but (in general) we will not hear any difference.
I like your recipe analogy – let’s try using it: If we construct a recipe using the wrong phase, we’ll use the correct ingredients (i.e., sinusoids with the correct magnitudes), but in the wrong relationship to each other.
On the left of the attached picture (you may need to be logged in to see it) is a cake constructed with the correct phases of all the ingredients. On the right, the same ingredients with the wrong phases. Close your eyes and they will taste the same, but they look very different.
Note that the sinusoid basis functions in Fourier analysis can never cancel each other out though – because they are orthogonal.
Attachments:
You must be logged in to view attached files.The video Frequency domain will help you understand why phase is less important than magnitude, for human perception, and for speech technology.
The terms ‘pitch period’ and ‘fundamental period’ are used interchangeably in the field. You’re right that this is technically incorrect.
‘Register’ here just means ‘in a different frequency range’.
Don’t worry if you think you are only analysing these sounds in very simple terms: you are – that’s the point here: just get get our hands on some audio samples and inspect them. Do the readings as well as the exercises – they will help.
We are usually only concerned with taking the DFT of real-valued signals, such as speech waveforms. So, we will always see this mirroring.
Here are some further clues:
1) the mirroring is centred around the Nyquist frequency
2) the signal does not contain any information above the Nyquist frequency
3) aliasing !Fourier analysis only works for sinusoidal basis functions.
There are other forms of analysis that use different basis functions, but those are far less common (and far beyond the scope of Speech Processing). Even in those cases, the basis functions need to obey certain properties (e.g., arranged in a series like the sinusoids in Fourier analysis, orthogonal, etc).
Your two basis functions are actually identical (except for amplitude), so are definitely not a valid series of basis functions.
The VPN makes your personal computer become part of the University’s internal network. This enables access to certain parts of the University computing infrastructure that cannot be accessed from the outside world.
In general, you do not need to be connected to the VPN all the time. Just use it when instructed.
For Speech Processing, you’ll need the VPN at the start of the first assignment, when there will be some files to copy from the University filesystem to the Virtual Machine.
sp-m1-3-sampling-sinusoids notebook, section Magnitude and Phase Modifications
The unit circle is always centred at the origin (0,0).
A sinusoid with no phase shift starts on the unit circle at co-ordinates (1,0).
A sinusoid with a phase shift starts somewhere else on the same unit circle. The amount of phase shift specifies how far around the unit circle that starting point is. For example, a phase shift of pi/2 radians would mean moving that far around the unit circle and thus starting at (0,1).
Good – you are reading the tutorial carefully!
This is not significant – we can use any symbol we like in Euler’s formula (so long as we use the same one on the right and left-hand sides of the equation, of course!). theta and phi are popular choices for angles, in geometry.
(The Wikipedia page on Euler’s formula uses x in the text but phi in the diagram, for example).
The Speech Processing course prioritises conceptual understanding over mathematical ability.
If you can draw a diagram of a concept, and explain it in words, then you’ll do well on this course.
The SIGNALS tutorials at the start of this course involve maths that most students will find challenging. You won’t be directly examined on this maths: we’re using it as a tool to explain and understand the concepts.
Keep trying to understand the maths, and keep asking for help with it. But always remember that it’s the concepts that really matter.
You don’t need both, just the the VM.
Yes, there are some Hyper-V settings that may need to be different to run WSL vs VMWare, although the latest versions of VMWare on Windows claim to have solved this.
It looks like Fusion Player (which they only made free very recently) only exists in version 12. So your remaining options are:
Upgrade your Mac to 10.15 Catalina – if you were planning to upgrade anyway, now is the time (not mid-semester). Some very old software may no longer work, so check that first if you rely on this.
Use VirtualBox free host software instead of VMWare. It is not as good but should work.
Purchase either VMWare or Parallels Desktop 16 for Mac. Look for education discounts (Parallels is £35 for 1 year, for students). In either case, get a free trial license first to confirm it works on your Mac and with our VM image.
-
AuthorPosts
This is the new version. Still under construction.