Forum Replies Created
-
AuthorPosts
-
I should follow my own rule: Always label both axes!
The pulse train is just a waveform, so it’s in the time domain. You are correct that the horizontal axis is time. The vertical axis should be labelled “amplitude” (which we can think of as sound pressure).
The units of amplitude are arbitrary, and in this example the scale goes from 0 to 1 (all these pulses are positive). We could just as well have labelled it with the sample value (which would be from -32768 to +32767 for a 16bit waveform, and so the pulses would each have an amplitude of 32767).
October 7, 2015 at 09:42 in reply to: Simple Synthetic Vowel: how to make it sound more natural #222Yes, one way would be to use a more complex source than the pulse train. This is what is done in Festival (in diphone and unit selection voices). The source waveform is something called the “residual” and is calculated so that the speech is almost perfectly reconstructed after that source signal is passed through the filter. In other words, the residual compensates for the fact that the filter is an oversimplification of the vocal tract.
We will touch on this at the end of the synthesis section of the course.
October 6, 2015 at 16:33 in reply to: The spectrum of a pure tone is not a perfect vertical line #216That’s a good question, but one with a rather technical answer.
First, it’s worth remembering that we usually view the spectrum on a log scale, and this exaggerates this effect.
The short answer is that this is a consequence of analysing a short region of the signal that – in general – will not contain a perfect integer number of complete cycles of the waveform. Therefore, we have to multiply the waveform by a tapered window to avoid discontinuities at the start and end (see my blog post about what happens without a tapered window).
Fading the signal in and out with the tapered window effectively changes its frequency content: for example, our pure sine wave would not be precisely a pure sine wave anymore (i.e., will now contain some other frequencies, caused by the application of the window function).
This article gives a good, and longer answer. Scroll down to “Windowing” and Figure 10, then read onwards to Figure 13. After that, it becomes a “my window function is better than your window function” competition.
The Wikipedia entry “Window function” has a long shopping list of slightly different window functions. Otherwise, I think that article is long but not very illuminating.
The most obvious effect is that the maximum frequency that can be stored (the Nyquist frequency) is reduced. That is, the speech has been low-pass filtered. The attached samples illustrate this.
This should be apparent even at high sampling rates – you should hear a clear difference between the 32kHz and 16kHz sample rate files. Use good headphones if possible.
See also:
-
This reply was modified 9 years, 8 months ago by
Simon.
-
This reply was modified 9 years, 8 months ago by
Simon.
-
This reply was modified 9 years, 8 months ago by
Simon.
Attachments:
You must be logged in to view attached files.The effect is best described as a type of distortion. Using fewer bits means that the digital waveform is a worse approximation to the original analogue one. The attached samples demonstrate this (use headphones).
Tips:
- use headphones to listen
- you might need to download the files and play them outside your browser (which might not handle the 4 bit version)
-
This reply was modified 9 years, 8 months ago by
Simon.
Attachments:
You must be logged in to view attached files. -
This reply was modified 9 years, 8 months ago by
-
AuthorPosts