The bitrate (or bit rate) of a signal is the number of bits required to store, or transmit, 1 s of that signal. A bit is a binary number: either 0 or 1. Let’s calculate the bitrate of a digital waveform. First you should revise the concepts of sampling and quantisation from this module of the Speech Processing course and this post.
Bitrate of a waveform
Here we will only consider one type of signal: a digital waveform. Sometimes, to make it clear that this waveform has not been cleverly encoded or compressed (e.g., as mp3) we might see such a waveform described as raw, or PCM which stands for Pulse Code Modulation, or (more correctly) LPCM which stands for Linear Pulse Code Modulation.
We already know that a digital waveform is a sequence of samples spaced equally in time. The number of samples per second is called the sample rate, sampling rate, or sampling frequency, and has units of Hz (which is the same as “per second”). Common values for speech waveforms range from 8 kHz, which is adequate for telephone quality speech, up to 48 kHz for high-quality recordings such as those made in a studio.
Each of those samples is the amplitude of the waveform at a particular moment in time. The amplitude is represented as a binary number with a fixed number of bits. A fixed number of bits means that the waveform has been quantised: the amplitude of each sample is a value from a fixed set of possible amplitudes. The more bits we use, the greater the number of distinct amplitudes that can be represented. The number of bits is called the bit depth. The most common bit depth by far is 16 bits and we’ll rarely need to consider other values. Professional studio equipment might use a higher bit depth, such as 24, but you would need expensive playback equipment and excellent hearing to tell the difference between 16 bit and 24 bit speech waveforms. Lower bit depths, such as 8, might be encountered in applications where a low bitrate is important, such as an old-fashioned telephone landline.
Since bitrate is the total number of bits that are required for 1 s of waveform, it can be calculated simply by multiplying the sample rate by the bit depth.
Bitrate of some example waveforms
Telephone quality speech might have a sample rate of 8 kHz and a bit depth of 8 bits, making a bitrate of 8 000 multiplied by 8 = 64 000 bits per second which we can write more compactly as 64 kbit/s , 64 kb/s , or 64 kbs.
A typical high-quality commercial recording might use a sample rate of 44.1 kHz and a bit depth of 16 bits, making a bitrate of 705.6 kbps. (If we wanted to store a stereo signal for, say, music, that would double to over 1.4 Mbps.)
Saving space
Obviously, if we want to represent the original analogue sound wave more faithfully, we need a higher sample rate and a higher bit depth. But this will come at a price of needing more bits per second: a higher bitrate.
If we want to reduce the bitrate of a LPCM waveform, we only have two choices: reduce the bit depth or reduce the sample rate.
Reduce the bit depth
Because computers generally store data not as individual binary bits but in groups of 8, known as bytes, it is much more convenient to use a bit depth that is a multiple of 8, so that each sample is a whole number of bytes. That’s why 16 bits is the most common bit depth for speech or any other audio. 8 bits might be used when saving space is more important than quality, and 24 bits when the highest quality is required and storage space is not a concern.
Using a bit depth that is too low will sound like noise has been added to the signal.
Reduce the sample rate
Reducing the sample rate will reduce the Nyquist frequency accordingly. For speech, a reduction in sample rate from 48 kHz studio recordings down to 24 kHz will be barely audible to most people, so that is a popular sample rate in modern speech synthesis.
Using a sample rate that is too low will remove higher frequency content from the signal and so it will sound muffled.
Coding
By cleverly encoding the waveform, it is possible to reduce the bitrate without using a low bit depth or low sample rate, and still retain high quality. Rather than simply represent the amplitude less often (reducing the sample rate) or compromise on the precision of each sample (reduce the bit depth), we can reduce the amount of information in some other way. For example, if we could predict that a listener will not hear some component of the signal, we could discard it and save some space. This is the realm of audio coding, which is beyond the scope of this post. There are a vast number of audio codecs because of the many different applications in which we need to compress audio, from transmitting it in real-time over the internet (known as streaming) during a Zoom call, to making your music smaller so that more of it will fit on your smartphone. When audio has been encoded in this way, there is no longer a simple relationship between sample rate, bit depth and bitrate. Most codecs allow the sample rate and bit depth to be kept constant (e.g., to the same values as the original LPCM waveform) but vary the bitrate. The perceptual consequences of lowering the bitrate is no longer as simple as when lowering the bit depth or the sample rate.