This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
In digital audio, 44,100 Hz (alternately represented as 44.1 kHz) is a common sampling frequency. Analog audio is recorded by sampling it 44,100 times per second, and then these samples are used to reconstruct the audio signal when playing it back.
The 44.1 kHz sampling rate originated in the late 1970s with PCM adaptors, which recorded digital audio on video cassettes,[note 1] notably the Sony PCM-1600 (1979) and subsequent models in this series. This then became the basis for compact disc digital audio (CD-DA), defined in the Red Book standard (1980). Its use has continued as an option in 1990s standards such as the DVD, and in 2000s, standards such as HDMI. This sampling frequency is commonly used for MP3 and other consumer audio file formats which were originally created from material ripped from compact discs.
Why 44.1 kHz?
The rate was chosen following debate between manufacturers, notably Sony and Philips, and its implementation by Sony, yielding a de facto standard. The actual choice of rate was the point of some debate, with other alternatives including 44.1 × 0.999 ≈ 44.056 kHz (corresponding to the NTSC color field rate of 60 × 0.999 = 59.94 Hz) or approximately 44 kHz, proposed by Philips. Ultimately Sony prevailed on both sample rate (44.1 kHz) and bit depth (16 bits per sample, rather than 14 bits per sample). The technical reasoning behind the rate being chosen is as follows.
Human hearing and signal processing
The Nyquist–Shannon sampling theorem says the sampling frequency must be greater than twice the maximum frequency one wishes to reproduce. Since human hearing range is roughly 20 Hz to 20,000 Hz, the sampling rate had to be greater than 40 kHz.
In addition, signals must be low-pass filtered before sampling to avoid aliasing. While an ideal low-pass filter would perfectly pass frequencies below 20 kHz (without attenuating them) and perfectly cut off frequencies above 20 kHz, such an ideal filter is theoretically impossible (it is noncausal), so in practice a transition band is necessary, where frequencies are partly attenuated. The wider this transition band is, the easier and more economical it is to make an anti-aliasing filter. The 44.1 kHz sampling frequency allows for a 2.05 kHz transition band.
Recording on video equipment
Early digital audio was recorded to existing analog video cassette tapes, as VCRs were the only available transports with sufficient capacity to store meaningful lengths of digital audio.[note 2] To enable reuse with minimal modification of the video equipment, these ran at the same speed as video, and used much of the same circuitry. 44.1 kHz was deemed the highest usable rate meeting the following criteria
- Compatible with both PAL and NTSC video[note 3]
- Requires encoding no more than 3 samples per video line per audio channel[note 4]
The sample rate is composed as follows:[note 5]
- 245 × 60 × 3 = 44,100
- 245 active lines/field × 60 fields/second × 3 samples/line = 44,100 samples/second
- (490 active lines per frame, out of 525 lines total)
- 294 × 50 × 3 = 44,100
- 294 active lines/field × 50 fields/second × 3 samples/line = 44,100 samples/second
- (588 active lines per frame, out of 625 lines total)
44,100 is the product of the squares of the first four prime numbers () and hence has many useful integer factors. In what appears to be a coincidence, the 44.1 kHz sampling rate is also exactly 4 times the line frequency of the old 441 lines German TV standard, which had a frequency of 441 × 50 ÷ 2 = 11,025 Hz (441 lines per frame, 50 fields per second, 2 fields per frame).
Various multiples of 44.1 kHz are used – the lower rates 11.025 kHz and 22.05 kHz are found in WAV files, and are suitable for low-bandwidth applications, while the higher rates of 88.2 kHz and 176.4 kHz are used in mastering and in DVD-Audio – the higher rates are useful both for the usual reason of providing additional resolution (hence less sensitive to distortions introduced by editing), and also making the low-pass filtering easier, since a much larger transition band (between human-audible at 20 kHz and the sampling rate) is possible. The 88.2 kHz and 176.4 kHz rates are primarily used when the ultimate target is a CD.
Several other sampling rates were also used in early digital audio. A 50 kHz sample rate, used by Soundstream in the 1970s, following a 37 kHz prototype. In the early 1980s, a 32 kHz sampling rate was used in broadcast (esp. in UK and Japan), because this was sufficient for FM stereo broadcasts, which had 15 kHz bandwidth. Some digital audio was provided for domestic use in two incompatible EIAJ formats, corresponding to 525/59.94 (44,056 Hz sampling) and 625/50 (44.1 kHz sampling).
The Digital Audio Tape (DAT) format was released in 1987 with 48 kHz sampling. This sample rate has become the standard sampling rate for professional audio. Sample rate conversion between these rates is complicated by the relatively high numbers in the ratio between these rates (the lowest common denominator of 44,100 and 48,000 is 147:160), but can now be done competently and efficiently. Early consumer DAT machines did not support 44.1 kHz and exploited this difference to make it difficult to copy 44.1 kHz CDs using 48 kHz DAT equipment.
Due to the popularity of CDs, a great deal of 44.1 kHz equipment exists, as does a great deal of audio recorded in 44.1 kHz (or multiples thereof). However, some more recent standards use 48 kHz in addition to or instead of 44.1 kHz. In video, 48 kHz is now the standard, but for audio targeted at CDs, 44.1 kHz (and multiples) are still used.
The HDMI TV standard (2003) allows both 44.1 kHz and 48 kHz (and multiples thereof). This provides compatibility with DVD players playing CD, VCD and SVCD content. The DVD and Blu-ray Disc standards use 48 kHz only.
Most audio processors/sound cards contain DAC for both 44.1 kHz and 48 kHz, being able to natively output either, though some older processors include only 44.1 kHz output, and some cheaper newer processors only include 48 kHz output, requiring digital sample rate conversion to output other sample rates. Similarly, processors may be able to record natively at only certain sample rates.
- Specifically U-matic cassettes
- Digital audio recording using a VCR as the transport and this format has been termed pseudo-video.
- It is simplest if the same number of lines are used in each field, and, crucially, it was decided that a sample rate that could be used on both NTSC (monochrome) and PAL equipment. Since NTSC has a field rate of 60 Hz, and PAL has a field rate of 50 Hz, their least common multiple is 300 Hz, and with 3 samples per line, this yields a sample rate that is a multiple of 900 Hz. For NTSC the sample rate is 5m × 60 × 3, where 5m is the number of active lines per field, which must be a multiple of 5 (the rest used for synchronization), and for PAL the sample rate is 6n × 50 × 3, where 6n is the number of active lines per field, which must be a multiple of 6. The sampling rates that satisfy these requirements – at least 40 kHz (so can encode 20 kHz sounds), no more than 46.875 kHz (so require no more than 3 samples per line in PAL), and a multiple of 900 Hz (so can be encoded in NTSC and PAL) are thus 40.5, 41.4, 42.3, 43.2, 44.1, 45, 45.9, and 46.8 kHz. The lower ones are eliminated due to low-pass filters requiring a transition band, while the higher ones are eliminated due to some lines being required for vertical blanking interval; 44.1 kHz was the higher usable rate, and was eventually chosen.
- Audio samples were recorded as if they were on the lines of a raster scan of video, as follows: analog video standards represent video at a field rate of 60 Hz (NTSC, North America – or 60/1.001 Hz ≈ 59.94 Hz for color NTSC) or 50 Hz (PAL, Europe), which corresponds to a frame rate of 30 frames per second (frame/s) or 25 frame/s – each field is half the lines of an interlaced image (alternating the odd lines and the even lines). Each of these fields is in turn composed of lines (see raster scan) – a frame of 625 lines for PAL and 525 lines for NTSC, though some of the "lines" are actually for synchronizing the signal (see vertical blanking interval), and a field comprises half the visible lines in one vertical scan. Digital audio samples were then encoded along each line, thus allowing reuse of the existing synchronization circuitry – as video, the resulting images look like lines of binary black and white (rather, gray) dots along each scan line. The line frequency (lines per second) was 15,625 Hz for PAL (625 × 50/2), 15,750 Hz for 60 Hz (monochrome) NTSC (525 × 60/2), and 15,750/1.001 Hz (approximately 15,734.26 Hz) for 59.94 (color) NTSC, and thus to record audio at the required over 40 kHz required encoding multiple samples per line, with 3 samples per line being sufficient, yielding up to 15,625 × 3 = 46,875 for PAL and 15,750 × 3 = 47,250 for NTSC. One wished to minimize the number of samples per line, so that each sample could have more space devoted to it, thus making it easier to have a higher bit depth (16 bits, rather than 14 or 12 bits, say) and better error tolerance, and in practice the signal was stereo, requiring 3 × 2 = 6 samples per line. However, some of these lines were devoted to (vertical) synchronization: specifically, the lines during the vertical blanking interval (VBI) could not be used, so a maximum of 490 lines per frame (245 lines per field) could be used in NTSC, and about 588 lines per frame (294 lines per field) on PAL (Note that, in video, PAL has (up to) 575 visible lines while NTSC has up to 485).
- In actual practice, different machines used different video standards – for example, the Sony PCM-1610 only used 525/60 monochrome video (NTSC, US), not 625/50 (PAL, Europe) or NTSC color.
- See Watkinson for detailed discussion of the history and diagrams.
- ITU-R BT.470-6
- AES5-2008 (r2013): AES recommended practice for professional digital audio - Preferred sampling frequencies for applications employing pulse-code modulation (revision of AES5-2003), Audio Engineering Society, 2014-06-16
- Larry Jordan. "Understanding Audio Sample Rate Conversions". Retrieved 2018-05-14.
- AES5-2008 (r2013): AES recommended practice for professional digital audio - Preferred sampling frequencies for applications employing pulse-code modulation (revision of AES5-2003), Audio Engineering Society, 2008
- The Art of Digital Audio, John Watkinson, 2nd edition
- Watkinson, section 1.14: "The PCM adaptor", pp. 22–24
- Watkinson, section 4.5: "Choice of sampling rate", pp. 207–209
- Watkinson, section 9.2: "PCM adaptors", pp. 499–502
- 2-35] Why 44.1 kHz? Why not 48 kHz?, CD-Recordable FAQ, by Andy McFadden et al.
- Henning Schulzrinne. "Explanation of 44.1 kHz CD sampling rate". Retrieved 2013-02-06.