Digital audio reproduction started to take off in mass production products with the launch of the Compact Disc (CD) by Sony and Philips. In the late 1970s and early 80s, digital audio reproduction moved on from just a few bits - six for the hi-hat in the Roland TR909 drum machine, eight for the samples in the Fairlight CMI, 13 for the Sony PCM1, to finally land at 16 bits as the standard for home reproduction, and 24 bits for live sound and recording. But why 16? Why 24?

The dynamic range of an audio system - in this blog arbitrarily defined as the ratio between peak signal and noise floor - is an important factor in sound reproduction quality. The higher the dynamic range, the more accurate the reproduction is, and the more differences in level can be reproduced. Before digital sound came in, audio was reproduced either by vinyl records or magnetic tapes, boasting a dynamic range of roughly up to 60dB.

As a rule of thumb, in digital systems - ignoring the noise of analogue circuits - the dynamic range in dB can be estimated by multiplying the number of bits by six. This gives 64dB for an eight-bit system (similar to magnetic tape), 96dB for a 16-bit system (e.g. the CD), 144dB for a 24-bit system and 192dB for a 32-bit system.

To make sense of bit rates and dynamic range, we have to discuss three aspects of sound reproduction: the human hearing system’s dynamic range, environment noise and the processing level uncertainty.

First the human hearing system, which constitutes a pair of ears and a brain. We can conclude from many reports that a ‘useful’ dynamic range of roughly 120dB can be detected by humans - the difference between the an average listener’s pain inducing peak level and hearing threshold level. So it stands to reason that a digital system should have at least 20 bits – multiplied by six this gives 120dB dynamic range. But today’s CD’s and WAV files are still 16-bit and nobody has a problem with it. How does that compute?

The answer is environmental noise. For ‘domestic’ sound reproduction - in a home, kitchen, car, hotel lobby - the environmental noise level can be around 40dB. Even in a quiet listening room, environmental noise levels usually don’t drop below 20dB. So about 80dB could be a good match - requiring only 14 bits. Sony and Philips decided to play it safe and go for 16 bits when they designed the CD, which apparently was a good call as it still stands today.

So why do most of today’s live and recording reproduction systems use 24-bit AD and DA convertors, not 20-bit? 20-bit would be enough to reproduce the human dynamic range. The answer is the unpredictable behaviour of acoustic audio sources. For a CD or WAV file properly mastered to 0dB peak level, no headroom is required in the D/A convertor. However, things change when an A/D convertor is involved. If a preamp is set to feed an A/D convertor to capture exactly 120dB dynamic range from a microphone set up at a predicted distance and with an assumed sound source peak level, and the sound source for some reason is a little louder or moves a little closer to the microphone, then it’s very likely that the captured level exceeds 120dB. Of course the sound engineer can then quickly adjust the pre-amplifier’s gain, but a short period of distortion can never be avoided. The solution is to apply some headroom - in today’s systems represented by four extra bits on top of the 20 bits. The total of 24 bits results in a headroom of 24dB on top of the 120dB full dynamic range... so the system’s processing can cope with uncertainty of signal levels without distortion.

It must be noted that 6dB per bit is just a rule of thumb. For several reasons, the dynamic range will always be a little lower than that. Also, even very well designed analogue circuits that have to support AD and DA conversion cannot support such a high dynamic range, limiting the resulting total system dynamic range including preamp, A/D and D/A and output buffer amp to a value of around 110dB.

It also has to be noted that the bit rate inside Digital Signal Processors (DSP) is always higher - usually a minimum of 32-bit to up to 64-bits. This is done mainly to support high quality resolution calculations inside the DSP systems for complex algorithms.