Sample rate conversion

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Sample rate conversion is the process of changing the sampling rate of a discrete-time signal to obtain a new discrete-time representation of the underlying continuous-time signal.[1] When applied to an image, this process is sometimes called image scaling.

Sample rate conversion is needed because different systems use different sampling rates, for engineering, economic, or historical reasons. The physics of sampling merely sets minimum sampling rate (an analog signal can be sampled at any rate above twice the highest frequency contained in the signal, see Nyquist frequency), and so other factors determine the actual rates used. For example, different audio systems use different rates of 44.1, 48, and 96 kHz. As another example, American television, European television, and movies all use different numbers of frames per second. Users would like to transfer source material between these systems. Just replaying the existing data at the new rate will not normally work—it introduces large changes in pitch (for audio) and movement as well (for video). Hence sample rate conversion is required.

Two basic approaches are:

Modern systems almost all use the latter since this method introduces less noise and distortion. Though the calculations needed can be quite complex, they are entirely practical given today’s modern processing power.

A famous example of analog rate conversion was converting the slow-scan TV signals from the Apollo moon missions to the conventional TV rates for the viewers at home. Another historical example, part analog and part digital, is the conversion of movies (shot at 24 frames per second) to television (roughly 50 or 60 fields[nb 1] per second). To convert a 24 frame/sec movie to 60 field/sec television, for example, alternate movie frames are shown 2 and 3 times, respectively. For 50 Hz systems such as PAL each frame is shown twice. Since 50 is not exactly 2×24, the movie will run 50/48 = 4% faster, and the audio pitch will be 4% higher, an effect known as PAL speed-up. This is often accepted for simplicity, but more complex methods are possible that preserve the running time and pitch. Every twelfth frame can be repeated 3 times rather than twice, or digital interpolation (see below) can be used in a video scaler.

Digital sample rate conversion[edit]

There are at least two ways to perform digital sample rate conversion:

  1. If the two frequencies are in a fixed ratio, the conversion can be done as follows: Let F = lowest common multiple of the two frequencies. Generate a signal sampled at F by interpolating 0s in the original sample. This will also introduce replicas at multiples of the baseband frequency. Remove these with a digital low-pass filter, until only the signals with less than half of the output sample frequency remain. Then reduce the sample rate by discarding the appropriate samples.
  2. Treat the samples as a time series, and create any needed new points by interpolation. In theory any interpolation method can be used, though linear (for simplicity) and a truncated (using a window function) sinc function (from theory) are most common. If samples are being removed (reducing sample rate), a low-pass filter at half the output frequency can be applied to bandlimit the signal before interpolation, reducing or eliminating aliasing.

Although the two approaches may seem different, they are mathematically identical. Picking an interpolation function in the second scheme is equivalent to picking the impulse response of the digital filter in the first scheme. Linear interpolation is equivalent to a triangular impulse response; sinc() will be an approximation to a brick-wall filter (it approaches the desirable "brick wall" filter as the number of points increase).

If the sample-rate ratios are known, fixed, and rational, method 1 is simpler. The length of the impulse response of the filter in method 1 corresponds to the number of points used in interpolation in method 2. In approach 1, a slow precomputation such as the Remez algorithm can be used to compute the "best" response possible given the number of points (best in terms of peak error in various frequency bands, and so on). Note that a truncated sinc() function, though correct in the limit of an infinite number of points, is not the most accurate filter for a finite number of points.

However, method 2 will work in more general cases, where the sample-rate ratios are not rational, or two real-time streams must be accommodated, or the sample rates are time-varying. An important distinction in method 2 is whether the result contains more or fewer samples: if the result is upsampled (to more samples), the form of interpolation has the most impact on the final result; conversely, if the result is downsampled (to fewer samples), the ability of the low-pass filter to bandlimit – or whether any filter is used – becomes increasingly important as the differential in sampling frequency increases.

Normally, due to the mathematical operations employed, the output samples of sample rate conversion are almost always computed to more precision than the output format can hold. Conversion to the output bit size can be done by simple rounding, or more sophisticated methods such as dither or noise shaping can be employed.

Digital audio[edit]

This operation in digital signal processing involves changing the sampling rate of a discrete-time signal to obtain a new discrete-time representation of the underlying continuous-time signal.[2] This is necessary when, for example, transcribing a signal recorded on Compact Disc (which has a native sampling rate of 44.1 kHz) to Digital Audio Tape (which encodes audio at 48 kHz).[3]

The lowest common multiple of 44.1 kHz and 48 kHz is 7.056 MHz. Had the original audio signal been recorded at that sampling rate then the process would be simple. Since 7.056 MHz is 160 × 44.1 kHz, and also 147 × 48 kHz, all we would need to do is take every 160th sample to get a 44.1 kHz sampling rate, and every 147th sample to get a 48 kHz sampling rate. Taking every Nth sample like this preserves the content provided the information (the audio signal) does not have any content above half the lowest sampling rate used (22.05 kHz) in this case.

Since the original has only 1/160 of the samples needed, one must generate the 7.056 MHz sampled signal. If one interpolates between the existing points, the frequency response is altered, and the signal becomes corrupted with noise above 24 kHz. This high-frequency noise is removed with a digital filter (usually a finite impulse response filter), based on an average of the noise.

To correct the frequency response, the missing samples are filled with zeros.[clarification needed] So if the original audio samples were ..,a,b,c,.., then the 7.056 MHz sequence is ..,a,0,0,0,...0,0,b,0,0...0,0,c,.., with 159 zeros between each original sample. This, too, adds high frequency noise—even more than linear interpolation does. Once again, a digital filter deletes the high-frequency noise.

So inserting the zeros, then running the digital filter, gives the needed signal—sampled at 7.056 MHz—but with no content above 24 kHz. Taking every 147th sample from an arbirary starting point yields the desired output.

See also[edit]

Notes[edit]

  1. ^ A field is half of an interlaced frame – just the odd or even lines.

References[edit]

  1. ^ Oppenheim, Alan V.; Schafer, Ronald W.; Buck., John R. (1989). Discrete-time signal processing, Volume 2. Englewood Cliffs: Prentice-hall. 
  2. ^ Trilla, Alexandre; Sevillano, Xavier (2010). "Multirate Discrete-Time Signal Processing" (PDF). Retrieved 29 December 2013. 
  3. ^ Rajamani, K.; Yhean-Sen Lai; Furrow, C. W. (2000). "An efficient algorithm for sample rate conversion from CD to DAT" (PDF). IEEE Signal Processing Letters 7 (10): 288. doi:10.1109/97.870683. 

Further reading[edit]

External links[edit]