Yes, what we are plotting on a waveform are deviations from the average pressure. These deviations can be positive (compression = air molecules are closer together than average) or negative (rarefaction = air molecules are further apart than average).
I should follow my own rule: Always label both axes!
The pulse train is just a waveform, so it’s in the time domain. You are correct that the horizontal axis is time. The vertical axis should be labelled “amplitude” (which we can think of as sound pressure).
The units of amplitude are arbitrary, and in this example the scale goes from 0 to 1 (all these pulses are positive). We could just as well have labelled it with the sample value (which would be from -32768 to +32767 for a 16bit waveform, and so the pulses would each have an amplitude of 32767).
Yes, one way would be to use a more complex source than the pulse train. This is what is done in Festival (in diphone and unit selection voices). The source waveform is something called the “residual” and is calculated so that the speech is almost perfectly reconstructed after that source signal is passed through the filter. In other words, the residual compensates for the fact that the filter is an oversimplification of the vocal tract.
We will touch on this at the end of the synthesis section of the course.
That’s a good question, but one with a rather technical answer.
First, it’s worth remembering that we usually view the spectrum on a log scale, and this exaggerates this effect.
The short answer is that this is a consequence of analysing a short region of the signal that – in general – will not contain a perfect integer number of complete cycles of the waveform. Therefore, we have to multiply the waveform by a tapered window to avoid discontinuities at the start and end (see my blog post about what happens without a tapered window).
Fading the signal in and out with the tapered window effectively changes its frequency content: for example, our pure sine wave would not be precisely a pure sine wave anymore (i.e., will now contain some other frequencies, caused by the application of the window function).
This article gives a good, and longer answer. Scroll down to “Windowing” and Figure 10, then read onwards to Figure 13. After that, it becomes a “my window function is better than your window function” competition.
The Wikipedia entry “Window function” has a long shopping list of slightly different window functions. Otherwise, I think that article is long but not very illuminating.
The most obvious effect is that the maximum frequency that can be stored (the Nyquist frequency) is reduced. That is, the speech has been low-pass filtered. The attached samples illustrate this.
This should be apparent even at high sampling rates – you should hear a clear difference between the 32kHz and 16kHz sample rate files. Use good headphones if possible.
The effect is best described as a type of distortion. Using fewer bits means that the digital waveform is a worse approximation to the original analogue one. The attached samples demonstrate this (use headphones).
Tips:
use headphones to listen
you might need to download the files and play them outside your browser (which might not handle the 4 bit version)
This reply was modified 8 years, 10 months ago by Simon.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.