› Forums › Foundations of speech › Signal processing › Music recording at 44.1 kHz and 16 bits
- This topic has 1 reply, 2 voices, and was last updated 4 years, 3 months ago by Simon.
-
AuthorPosts
-
-
October 4, 2020 at 19:32 #12190
I have always seen audio CDs have music recorded at 44.1 kHz with a bit depth of 16. In my understanding,the sampling frequency of 44.1 kHz is related to humans having an upper listening limit of 20 kHz (because the Nyquist frequency is 22.05 kHz).
On the other hand, many lossless audio codecs like FLAC and WAV have much higher resolution: 24 bits + 96 kHz and sometimes even, 24 bits + 192 kHz. I am a bit confused about what advantage increasing the sampling rate has, considering the fact that most humans are limited to 20 kHz. Does the lossless nature of the music really provide any sonic difference? -
October 5, 2020 at 09:07 #12192
Let’s clear up the terms ‘lossless’ and ‘codec’ first. In the Speech Processing course, we are only ever talking about raw waveforms. These are ‘lossless’ and there is no ‘codec’ as such: the values of the samples are stored directly. ‘WAV’ is just a file format for storing raw waveforms preceded by a header containing useful information such as sample rate, duration, number of channels, etc.
A lossy codec, such as mp3 or AAC, does not store the samples, but encodes them in a way that loses some unimportant information (determined using a model of human hearing, for general-purpose codecs) . We don’t need to understand these codecs for the Speech Processing course. Speech-specific codecs, such as that used on your mobile phone, typically use the source-filter model, rather than a model of hearing.
Now on to the value of different sampling rates and bit depths. For consumer audio, there is little or no benefit of using a sampling rate higher than 44.1 kHz or a bit depth greater than 16. We more often see 48 kHz in professional audio, simply because it divides by 2 or 3 more sensibly.
In professional audio, such a music recording studio, we may well use a higher sampling rate and greater bit depth. This is because the signal will undergo all sorts of processing as part of the production process (e.g. time and pitch modification). This processing will introduce artefacts, and having a very high Nyquist frequency will place those artefacts up beyond the range of human hearing. A greater bit depth simply means storing each sample with greater precision, again giving more robustness against some sorts of processing such as changing the level (e.g., when mixing tracks together). Just before publishing the music, the audio is downsampled to 44.1 kHz and the bit depth reduced to 16.
Some people claim to be able to hear the difference between 48 kHz and 96 kHz. You would need a very well-produced example audio file, a good ear, and expensive equipment to try this for yourself.
Here is an example of reducing bit depth, so you can hear the effect.
-
-
AuthorPosts
- You must be logged in to reply to this topic.