Audio bit depth
In digital audio, bit depth describes the number of bits of information recorded for each sample. Bit depth directly corresponds to the resolution of each sample in a set of digital audio data. Other examples of bit depth include CD quality audio, which is recorded at 16 bits, and Super Audio CD, DVD-Audio and Blu-ray Disc, all of which can support up to 24-bit audio.
Contents |
Digital audio [edit]
A set of digital audio samples contains data that, when converted into an analog signal, provides the necessary information to reproduce the sound wave. In pulse-code modulation (PCM) sampling, the bit depth will limit signal-to-noise ratio (S/N). The bit depth will not limit frequency range, which is limited by the sample rate.
By increasing the sampling bit depth, quantization noise is reduced so that the S/N is improved. The 'rule-of-thumb' relationship between bit depth and S/N is, for each 1-bit increase in bit depth, the S/N will increase by 6 dB.[2][3] 24-bit digital audio has a theoretical maximum S/N of 144 dB, compared to 96 dB for 16-bit; however, as of 2007[update] digital audio converter technology is limited to a S/N of about 124 dB (21-bit)[4] because of real world limitations in integrated circuit design. Still, this approximately matches the performance of the human ear.[5][6]
Technically speaking, bit depth is only meaningful when applied to pure PCM devices. Non-PCM formats, such as lossy compression systems like MP3, have bit depths that are not defined in the same sense as PCM. In lossy audio compression, where bits are allocated to other types of information, the bits actually allocated to individual samples are allowed to fluctuate within the constraints imposed by the allocation algorithm.
24-bit quantization [edit]
24-bit audio is sometimes used undithered, because for most audio equipment and situations the noise level of the digital converter can be louder than the required level of any dither that might be applied.
There is some disagreement over the recent trend towards higher bit-depth audio. It is argued by some that the dynamic range presented by 16-bit is sufficient to store the dynamic range present in almost all music. In terms of pure data storage this is often true, as a high-end system can extract an extremely good sound out of the 16-bits stored in a well-mastered CD. However, audio with very loud and very quiet sections can require some of the above dithering techniques to fit it into 16-bits. This is not a problem for most recently produced popular music, which is often mastered so that it constantly sits close to the maximum signal (see loudness war); however, higher resolution audio formats are already being used (especially for applications such as film soundtracks, where there is often a very wide dynamic range between whispered conversations and explosions).
For most situations the advantage given by resolution higher than 16-bit is mainly in the processing of audio. No digital filter is perfect, but if the audio is upsampled and the audio is done in 24-bit or higher, then the distortion introduced by filtering will be much quieter (as the errors always creep into the least significant bits) and a well-designed filter can weight the distortion more towards the higher inaudible frequencies (but a sample rate higher than 48kHz is needed so that these inaudible ultrasonic frequencies are available for soaking up errors).
There is also a good case for 24-bit (or higher) recording in the live studio, because it enables greater headroom (often 24dB or more rather than 18dB) to be left on the recording without encountering quantization errors at low volumes. This means that brief peaks are not harshly clipped, but can be compressed or soft-limited later to suit the final medium.
Environments where large amounts of signal processing are required (such as mastering or synthesis) can require even more than 24 bits. Some modern audio editors convert incoming audio to 32-bit (both for an increased dynamic range to reduce clipping, and to minimize noise in intermediate stages of filtering).
Dynamic range [edit]
Dynamic range is the difference between the largest and smallest signal a system can record or reproduce. With the proper application of dither, digital systems can reproduce signals with levels lower than their resolution would normally allow.[7]
Performance [edit]
8-bit resolution, as found in older computers and audio samplers offers up to a 48dB dynamic range under perfect recording and reproduction conditions (roughly equivalent to standard-grade audio cassette tape, but with more obvious quantization errors at low volumes unless a deliberate 1-bit background noise "dither" is introduced, which provides a greater perceived dynamic range despite the noise floor being at approx -45dB), and 16-bit, as used in CD and modern equipment, can provide up to 96dB of dynamics (again, a deliberate noise floor may be introduced to soften perceived quantization error; however in this case, the floor is still below -90dB, which is quiet enough to become lost in circuit distortion in cheap players, or environmental background noise in all but the quietest rooms with the loudest playback volume). The 12- and 14-bit DV/NICAM standards (-72 and -84dB respectively) were thought to be perfectly adequate for televisual and video camera applications at the time of their inception, particularly compared to VHS and Hi-8.
Audiophile-spec recording resolutions extend this to a theoretical -120dB (20-bit) or -144dB (24-bit), the latter of which exceeds the dynamic range between complete silence (signal energy below that which can be detected by the human ear) and noise of high enough intensity to cause almost immediate ear injuries, with an ideal 24-bit DAC and associated amplifier being able to accurately output signal values from 0, 1, 2 through 16777213, 16777214, 16777215.
Applications [edit]
Digital telephony frequently uses 8-bit quantization. That is, values of the analogue waveform are rounded to the closest of 256 distinct voltage values represented by an 8-bit binary number. This crude quantization introduces substantial quantization noise into the signal, but the result is still more than adequate for human speech. Mobile phones however use more complex schemes such as linear predictive coding
Standard DV audio is 12-bit (4096 levels), NICAM pseudo-14-bit (10-bit data + 4-bit gain signal, with 14-bit output DAC).
Compact discs use a 16-bit digital representation, allowing 65,536 distinct levels. This is far better than telephone quantization, but CD audio representing low signal levels would still sound noticeably 'granular' because of the quantizing noise. However, sometimes an addition of a small amount of noise is added to the signal before digitization. This deliberately added noise is known as dither. Adding dither eliminates this granularity, and gives very low distortion, but at the expense of a small increase in noise level. Measured using ITU-R 468 noise weighting, this is about 66dB below alignment level, or 84dB below FS (full scale) digital, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence (see Programme levels for more on this).
Super Audio CD, DVD-Audio, and Blu-Ray audio can use 20 or even up to 24-bit sampling (>16 million levels). CD Audio has also left a lasting impression on computer and other digital audio applications, where 16-bit is the default "hi-fi" sample resolution (as opposed to earlier 8, 6 or even 4-bit efforts), with higher precision often considered the reserve of audiophiles as the representable range of intensities rapidly exceeds the theoretical limits of human perception, particularly when environmental noise is considered.
Binary resolution [edit]
In computing parlance, bit is the abbreviation for a single binary digit, represented by a 0 or a 1. A word is a binary number with more than one digit. Binary numerics are base-2; thus, each digit can only be a 0 or a 1. In comparison, traditional decimal numerics are base-10, having digits that can only be 0 through 9. For example, the 16-bit binary number 0110111110111010 is equivalent to the 5-digit decimal number 28602. The number of bits per word is simply how many digits there are in the corresponding number. The words in commonly used PCM digital audio formats are 8, 16 or 24 bits long.
Larger words have higher resolution. The number of possible values that can be represented by the bit depth can be calculated by using 2x, where x is the bit depth.[8] Thus, the resolution of a 16-bit system (216) is 65,536 and a 24 bit system (224) has a resolution of 16,777,216. PCM audio data is typically stored signed, in two's complement,[9] so a 16-bit audio sample represents a decimal number from -32,768 to 32,767 and a 24-bit sample represents a decimal number from -8,388,608 to 8,388,607.
Bit rate [edit]
Bit rate refers to the amount of data, specifically bits, transmitted or received per second.
One of the most common bit rates given is that for compressed audio files. For example, an MP3 file might be described as having a bit rate of 160 kbit/s or 160000 bits/second. This indicates the amount of compressed data needed to store one second of music.
The standard audio CD is said to have a data rate of 44.1 kHz/16, meaning that the audio data was sampled 44,100 times per second, with a bit depth of 16. CD tracks are usually stereo, using a left and right track, so the amount of audio data per second is double that of mono, where only a single track is used. The bit rate is then 44100 samples/second/track x 16 bits/sample x 2 tracks = 1,411,200 bit/s or 1.4 Mbit/s.
This explains why, for example, a Minidisc recorder, which uses ATRAC compression, can store files lasting twice as long on a disc, if the default, recording in 2 channel stereo, is set to single channel mono recording.
To fully define a sound file's digital audio bit rates: the format of the data, the sampling rate, word size (bit depth), and the number of channels (e.g. mono, stereo, quad), must be known.
Calculating values [edit]
An audio file's bit rate can be calculated given sufficient information. Given any three of the following four values, the fourth can be calculated.
- Bit rate = (sampling rate) × (bit depth) × (number of channels)
E.g., for a recording with a 44.1 kHz sampling rate, a 16 bit depth, and 2 channels (stereo):
- 44100 samples per second per channel × 16 bits per sample × 2 channels = 1411200 bits per second = 1411.2 kbit/s
The eventual file size of an audio recording can also be calculated using a similar formula:
- File Size (bits) = (sampling rate) × (bit depth) × (number of channels) × (seconds)
Because there are 8 bits in a byte:
- File Size (bytes) = File Size (bits) / 8
E.g., a 70 minutes long CD quality stereo recording will take up 740,880,000 Bytes:
- 44,100 × 16 × 2 × 4200 / 8 = 740,880,000 Bytes
References [edit]
- ^ Hodgson, Jay (2010). Understanding Records, p.56. ISBN 978-1-4411-5607-5. Adapted from Franz, David (2004). Recording and Producing in the Home Studio, p.38-9. Berklee Press.
- ^ See Signal-to-noise ratio#Fixed point)
- ^ Walt Kester (2007). "Taking the Mystery out of the Infamous Formula, "SNR = 6.02N + 1.76dB," and Why You Should Care". Analog Devices. Archived from the original on 16 June 2011. Retrieved 2011-07-26.
- ^ "PCM4222". Retrieved 2011-04-21.
- ^ D. R. Campbell. "Aspects of Human Hearing". Retrieved 2011-04-21. "The dynamic range of human hearing is [approximately] 120 dB"
- ^ "Sensitivity of Human Ear". Archived from the original on 4 June 2011. Retrieved 2011-04-21. "The practical dynamic range could be said to be from the threshold of hearing to the threshold of pain [130 dB]"
- ^ "Dithering in Analog-to-Digital Conversion". e2v Semiconductors. 2007. Retrieved 2011-07-26.
- ^ Thompson, Dan (2005). Understanding Audio. Berklee Press. ISBN 978-0-634-00959-4.
- ^ Smith, Julius (2007). "Pulse Code Modulation (PCM)". Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications, Second Edition, online book. Retrieved 22 October 2012.
- Ken C. Pohlmann (February 15, 2000). Principles of Digital Audio (4th ed.). McGraw-Hill Professional. ISBN 978-0-07-134819-5.
See also [edit]
- Sound
- Digital audio
- Bitrate
- Color depth, the corresponding concept for digital images