# Talk:Nyquist–Shannon sampling theorem/Archive 1

## Disagreeing with Kupfmuller filters

The following passage rings suspicious, it is obviously impossible to reconstruct arbitrary signals once they are sampled. If you want to keep this paragraph in, explain here why, or I will delete it.

The theory of reconstruction of the analog signal from samples requires a so-called Kupfmüller filter. Such a filter is a theoretical filter with certain properties, which can only be approximated in reality. The filter is supposed to be fed directly with the sample impulses and generates the original analog signal.

Loisel 07:09, 16 Sep 2004 (UTC)

I think there's some form of filtering or interpolation that can be used to *perfectly* reproduce a bandlimited sampled signal, instead of the usual approximate filtering of all harmonics. Related to the sinc function? I'm not sure, but assumed this is what they were talking about. - Omegatron 19:03, Sep 16, 2004 (UTC)
That would be the Nyquist-Shannon interpolation formula, Omegatron - Omegatron
The theory of discrete Fourier transforms tells you that if U is the discrete Fourier transform of u, then the discrete inverse Fourier transform of U is u again, without any error whatsoever. If you have a bandlimited signal f, sample it to u, take its DFT U, then take an IDFT of U, you'll recover u perfectly. If u has sufficiently many samples for the bandwidth of f, then f is given by the unique interpolating trigonometric polynomial for u. The coefficients of the interpolating polynomial are given by U. As you can see, there's no filter. Applying a sinc filter of the correct frequency to a bandlimited signal won't change it at all, since it's already bandlimited. So this Kupfmuler stuff is nonsense. Loisel 07:05, 20 Sep 2004 (UTC)
Oh I remember now. To convert the sampled signal back to analog you put it through a zero-order hold or the like, which creates the original signal in the baseband with harmonics every multiple of the sampling frequency, and you filter the harmonics. The sinc "filter" i was thinking of is actually what you convolute the signal by. sinc convolution in time domain = rectangular ideal filter in frequency domain. right? and no real filter is ideal, so technically you are only attenuating the higher frequencies, not destroying them completely. So I thought maybe this kupmuller stuff had to do with the sinc convolution, but nope. That's just an ideal filter. - Omegatron 13:29, Sep 20, 2004 (UTC)

Actually, since I will forget about this, I am deleting now. Restore it if you can justify it.

Loisel 07:11, 16 Sep 2004 (UTC)

Apparently people missed the word "theoretical". I actually tells much about the education of people who claim this is nonsens but scrap the bottom of the barrel to fabulate about Nyquist and Shannon. If it isn't in their beginner's textbooks it is not supposed to exist?

Karl Küpfmüller (*1897, +1977), professor in Gdansk (http://www.pg.gda.pl/~mjasina/pehgo2000/hist2en.html) and in Darmstadt, contributed basic research to the telecommunication system theory. One of his contributions is the Kupfmüller Filter. It is a theoretical construct, with an ideal behavior in the frequency domain. It shows exactly the properties one needs to convert an ideal(!) (did you get this: IDEAL, we are talking system theory here) sampling sequence - where each sampling is an infinit short impuls, and a sampled signal is a sequence of such impulses. And the amplitude of each such impuls resembles the amplitude of the original, sampled system at the moment it was sampled.

To reconstruct the original signal from such a series of impulses one needs a (mathematical, ideal) device "undoing the sampling". That device is called a Kupfmüller filter. It is a lowpass filter with an impulse response h(t) = A0 * omegac /pi * sin(omegac (t - t0)) / (omegac (t - t0)); for t = -infinit ... infinit. Since the impuls response starts at -infinit it is obvious that the filter is non-causal. It does not exist. It is an idealisation. It is system theory. You get it? System theory, like the sampling theorem, also system theory. However, processing (calculating) a sequence of sampling impulses, which have been sampled adhearing to the sampling theorem with a kupfmüller filter (you treat the sequence of samples as some signal), results in the original signal. You need this theoretical filter in the system theory, because if you take the sequence of samples as some signal, and each sample is infinit short (an impuls), one can easily see that it contains infinitly high frequencies due to the impulses. Frequencies, which were never in the original signal.

The fact that one can in theory, under ideal conditions reconstruct the original signal without loss of information is the whole key of the sampling theorem! If you stay within its limits (which one can't), if you manage to do an ideal sampling (infinit short sampling impulses, with infinit precision), and if you apply a Kupfmüller filter to the sequence of samples, you get exactly the original signal out. Do the math! That's why the theorem was such a great breakthrough. It provides the theoretical foundation for all digital signal processing. It establishes that there is a 1:1 relation between an analog and a digital signal. In practice it has shown that even without having the ideal sampling, without the ideal Kupfmüller filter, without infinit precision, etc. it works good enough to do serious things. This is what some people call the digital revolution. And what Loisel-Loser calls nonsens.

If you care, do the math. But I doubt that you care. I am to tired to change the article. The nonsens fanboys will delete it anyhow. I have just one question for you: Why the fuck do you write about sampling if you don't get the system theory behind it? 89.52.130.249 23:53, 14 March 2006 (UTC)

Please identify Yourself if making such accusations. At least use a wikipedia login. Next provide a reference for Your claim. Explain in which way the filter You want to introduce differs from the cardinal series introduced by E.T. Whittaker (prof. in Birnmingham) in 1915. The series itself dates back to E. Borel in 1898. I don't think that You want to claim that a one-year old has precedence on that result. This series is treated in this article as well as in Nyquist-Shannon interpolation formula.--LutzL 09:39, 16 March 2006 (UTC)

The following disputed psychobabble also deleted:

In practice, the sampling data is usually not available any more. It has been quantized ("squeezed" or broken into discrete values) and digitized (converted into symbols, such as numbers). Digital to analog converter circuits are used to decode the digital information into analog, which is then filtered. These circuits are usually based on op-amps. Each quantized value is associated with a certain voltage level, and the op-amp circuits generate that voltage level when seeing the particular digital input signal.

Loisel 07:14, 16 Sep 2004 (UTC)

Psychobabble? I think you are using the wrong word. Anyway, what's wrong with a description of digital sampling? - Omegatron 19:03, Sep 16, 2004 (UTC)
All right, technobabble then. The text above is handwavy, fails to explain the terminology (like "quantized") and, since it's trying to explain why Kupfmuller filters don't have the "sampling data", irrelevant (by virtue of Kupfmuller filters being nonsense.)
If a signal is quantized to floating points, many purposes the quantization does not matter and the IDFT gives u from U to extremely high precision. If a signal is quantized for the purpose of compression, u can obviously not be recovered from V (my notation for quantized U) but this has nothing to do with the sinc filter or anything. If v is a quantized u, then the DFT V of v will yield v again to high precision using the IDFT. Digital to analog converters do not affect the quantization, and are no more affected by it than the software IDFT. Injecting op-amps adds nothing to the discussion. If you want to have a reference to DACs, just put "see also Digital to analog converters." One possible statement would be, "Measurement devices like Charge-coupled devices, antennas, geiger counters, etc... and output devices such as Cathode ray tubes, loudspeakers, Light-emitting diodes, the brakes of an Electronic Stability Program, etc... will add some error to the input or output. Such error can make high frequency information irrecoverable even when we have very many samples. This effect can sometimes (but not always) be regarded as the Nyquist theorem applied to the Fourier transform of the signal." Such a comment would have to be made in a non-technical context or include the caveats that "high frequency" may actually not be correct. In some instances, the low frequency data is the noise, and the high frequency is good. In other cases, the noise does not correspond to a neat range of requencies. See aliasing for details.
If you want to talk about hardware, make sure that what you're saying is true and has its place in an article with "theorem" in the title. Loisel 07:05, 20 Sep 2004 (UTC)
Good enough for me. - Omegatron 13:29, Sep 20, 2004 (UTC)

## Disagreeing with the theorem.

I just read this and the aliasing article, and I must say that I disagree with what's written in there.

Perhaps in engineering this is gospel and sacred, but many of the assertions and hypotheses are unstated, unmotivated and even perhaps incorrect. Why would signals with high frequencies be wrong? Why should we choose, of all the signals that alias to the same sampled signal, the one that has zeroes for the high frequencies? There are reasons for that, but they vary from case to case, and stating that using zero for the high frequency as a canon isn't scientific.

If anyone feels strongly about these articles, continue the discussion in here. If nobody pipes up, I'll change the articles around significantly. Loisel 04:41 Jan 24, 2003 (UTC)

Dou you mean we can generalize to: If S(f) = 0 outside a given interval of width 2W, then s(x) can be recovered. ? Here S(f) is defined for positive and negative f, so if the interval is 3W to 5W, S should also be zero for -5W to -3W. We can add this, especially if you can describe an application for this it is interesting. - Patrick 10:38 Jan 24, 2003 (UTC)
Not quite. Here is the short version. A signal s(x) is just another word for a function. There are many functions. If you sample functions, there are a lot fewer sampled functions. The map you use to go from a signal s(x) to a sampled version (s_k) will necessarely collapse several different signals into the same sampled version by the pigeonhole principle, this is called aliasing. Two signals s(x) and t(x) which collapse to the same sampled signals are called aliases of one another. In general it is preferable that the map L:s(x)->(s_k) produce sampled versions which correspond as closely to the original s(x) according to some metric, or perhaps according to the human brain. Experimentation has suggested that in many cases, the signal that most resembles the sampled version is the one which has the least high frequency components. However, this is simply a heuristic, and in some cases, it is patently false. For instance, if one is working on the real line (not with periodic function) with signals that are compactly supported, it is completely absurd to hope that the spectrum such a signal is also compactly supported. Let us assume that L is linear. Note that the space of sampled signals is of dimension n. Then we would like to choose, for each sampled signal (s_k), a pre-image L^{-1)(s_k)=s(x) which we believe gave rise to (s_k). With the assumption above in italics, then we can choose a linear space of dimension n as the pre-image L^{-1}(S)of the space of sampled signals S={(s_k)}. This is in fact the Nyquist-Shannon sampling theorem (and we could write it explicitly sort of the way it's written, although I suspect there's an off-by-one bug in the current statement.) With this presumption, the Nyquist-Shannon interpolation formula follows. In view of this, I'm not sure how to split the discussion between aliasing and this theorem, which is why my original comment was about both articles. Loisel 19:18 Jan 24, 2003 (UTC)
Also, I should add that I don't intend to massacre the article, just to present the information so that the Fourier approach isn't presented as some sort of God-given truth, as well as offer a linear algebra explanation of the theorem. Loisel 05:51 Jan 25, 2003 (UTC)

Maybe, in the interest of conciseness, this article can stay as is, and the article under aliasing can fill in the blanks I discussed above. Loisel 08:34 Jan 27, 2003 (UTC)

---

What does "the sampling frequency must be original perfectly from the sampled version." in the page meen - this change was made almost two months ago, and I have no idea what it means - can we revert to what the older edits said in this place: "the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version."

Probably a typo where a line got deleted. --Damian Yerrick

I suggest that we leave the psychoacoustic stuff out of this particular article and limit the discussion strictly to the objective, mathematical properties of sampling that Nyquist originally studied. He was actually studying the reverse problem to the one usually mentioned in connection with his theorem, which was how fast one could signal in a bandlimited channel like a telegraph cable without intersymbol interference. His result was equivalent to its modern application in digital sampling. To that end, I would merge the discussion of oversampling into the main body and restate the theorem as it is now stated in the oversampled section, i.e., the sampling rate must be at least twice as high as the bandwidth of the signal being sampled to avoid any loss of information. --Phil Karn, KA9Q

---

I liked the page very much but strongly suggest to remove the last paragraph (which I did) since the claim that the sampling problem is analogous to polynomial interpolation simply is wrong and misleading. -- Olivier Cappé, 11:23 Mar 8, 2004 (CET)

## Also is known

It also is known as Whittaker-Nyquist-Kotelnikov-Shannon sampling theorem (example in Russia is's known as Nyquist-Kotelnikov theorem).

• Nyquist - 1928
• Kotelnikov - 1933
• Whittaker - 1935
• Gabor - 1946
• Shannon - 1948

--.:Ajvol:. 09:50, 24 Oct 2004 (UTC)

## What is "W"?

The article states that

" F[s(x)] = S(f) = 0 for |f| ≥ W"

What is meant with the W? I think this should be explained in the article!

Thanks

W means bandwidth. So for example, an audio signal would have a one-sided bandwidth of 20 kHz; 20,000 cycles per second. If you took the frequency transform of an audio signal, it would be zero for everything higher frequency than 20 kHz (and for everything less than -20 kHz). Then, according to this section, you would need to measure that signal with a minimum time of 1/(2W) between each measurement to not lose any information. in this case 1/(2W) = 1/(2*20,000) = 1/40,000 = .000025 seconds. usually this is specified by just saying you have to measure it at least 40,000 times per second, or twice the bandwidth.
Is this clear enough? I will try to fit it into the article. - Omegatron 16:37, Nov 15, 2004 (UTC)
Hi Omegatron, thanks for your quick reply. It helped a lot! There only is one thing I haven't understood clearly: let's imagine a soundcard that samples audio input at 44 KHz (you see, the same with your explanation). What the soundcard does, is it takes a record of the voltage every .000025 seconds.
If you drew all those recorded values on a time-voltage diagram (time being the variable), you'd get a diagram full of dots. To reconstruct the original signal, you could draw lines between each dot and its next neighbor.
(*) That would be quite a close approximation, but you couldn't find out what happened between those .000025-second-snapshots. Maybe there was a high voltage burst (being a Dirac distribution for example) somewhere between the intervall 1.000025 s and 1.000030 s.
Maybe you understand the problem I see - but maybe I just made a mistake at some point in my thought.
Thank you for your help, --Abdull 18:57, 15 Nov 2004 (UTC)
Very correct. However, the dots with spaces between them actually DO have all the information in the original signal. As long as they are at most .000025 seconds apart. That's the whole essence of the theorem. You just have to get it back out in the correct way. I can't think of a simple explanation, but I will try. If you were to connect them as you described, with a straight line between (i think this is called a first-order hold), you would not reproduce the original signal. What you actually do in a real system is even cruder! You just make a horizontal line out from each dot like stairsteps (this is a zero-order hold, i think). The thing is, you are creating extra frequencies above 22 kHz when you do that, perfect multiples of the original frequencies. If you then filtered out those higher frequencies, you would smooth out the horizontal lines into exactly the original signal. As long as the original signal doesn't go above 22 kHz, you can reproduce it exactly with sampling at 44 kHz. It's hard to explain. (When you've figured it out, help me make the article easy for beginners to understand.) Here's some diagrams http://cnx.rice.edu/content/m10402/latest/ - Omegatron 19:23, Nov 15, 2004 (UTC)

In reply to (*): if the signal has no high frequencies (that is, if F, the Fourier transform of f is zero for any frequency w greater than W) then it is not possible to have a large spike between the samples. This is related to but not the same as the Nyquist sampling theorem. One manifestation of this truth is the Nyquist-Shannon interpolation formula.

If you don't have a very good understanding of the article as it is, please don't try to "make the article easy for beginners to understand." We'll just end up with something less good than what there is now.

Loisel 17:08, 23 Nov 2004 (UTC)

I meant "help us make the article easier to understand, by pointing out what needs to be added." It should be accessible to beginners while still being accurate. - Omegatron 20:33, Nov 23, 2004 (UTC)
I agree -- this is one of those things that I learned so long ago that I can't remember what it was like before I understood it. So, although it now seems obvious to me, it is difficult for me to explain to someone else.
Nevertheless, I'm going to try again, hoping something in what I say will inspire you to create a really good explanation. You, dear reader, whoever may be reading this.
OK, say that somewhere between the interval 1.000025 s and 1.000030 s, there's this huge spike that goes way up then comes back down again.
Every properly-designed sound card has an analog anti-aliasing filter to limit the bandwidth. If a signal is already band-limited, it goes right through that filter unchanged.
However, this spike is heavily "distorted" by that filter. If it's "small", it may be entirely clipped out by the filter. If it's "large", it will be rounded off and smeared over a much larger time period -- such a long time period that it effects at least one sample, probably more.
One could paraphrase the Nyquist-Shannon sampling theorem like this:
If the signal is so smooth and rounded off that it goes right through the antialiasing filter unchanged (or it's the output from such a filter), then nothing important happens between samples -- the signal simply rises or falls in a very smooth way between samples.
The mathematical details specify *how* smooth it has to be.
I like the pigeonhole principle explanation that Loisel gives, and hope it makes its way into the main article.
I hate to confuse people by bringing in further complexities, but I want to point out that this is not the only situation where a few samples can give us all the information there is to know about an infinite number of points.
For example:
• if you know some function is a straight line, you only need to know 2 points on it to know everything there is to know about that function, which contains an infinite number of points.
• If you know some shape is a perfect square, you only need to know a couple of points on each side of the square to know everything there is to know about the perimeter of the square, which contains an infinite number of points. (You don't necessarily need to be given the exact corners -- you can figure them out from other points).
• if you know that the signal is some kind of perfectly periodic triangle wave or sawtooth wave, and you know roughly what its repetition rate is, then it only takes around 6 (? or so) samples (a couple of samples on the falling line, a couple of samples on the next rising line, and a couple of samples on the next falling line) to measure everything there is to know about the the sawtooth wave -- the slope of the rising segments, the slope of the falling segments, the maximum value, the minimum value, the repetition frequency, and the "phase". (You don't necessarily need to be given the exact corners -- you can figure them out from the other points).
--DavidCary 22:58, 10 November 2005 (UTC)
I like that line. Here's my version: If the signal has already been smoothed enough by the anti-aliasing filter, then nothing important happens between samples — the signal simply rises or falls in a smooth, predictable way between samples. The mathematical details specify *how* smooth it has to be for this to happen.
I also like the geometric analogy.
• If you know that a function is a straight line, you only need to know two points and you can predict every other point along that line.
• If you know that a shape is a perfect circle, you only need to know three points to fully define it.
• If you know that a function is a sine wave, you only need to know three(?) points to fully define it.
• Similarly, it can be shown that if you know the signal is made out of sine waves that change no faster than fs/2, then you only need to know points every T seconds.
Something like that. — Omegatron 01:23, 11 November 2005 (UTC)

## Sampling theory for beginners

As a "beginner", I think the article is fairly good and accessible. However, I am left with a couple of questions that I think should be clarified in some way or another.

Suppose the signal consists of a simple sine wave, e.g. sin(2*pi*f*t) with f = 5Hz. Suppose I sample that signal at a frequency f = 11Hz. When plotted on graph paper, the sampled signal appears to be the product of two sine functions: one with the original frequency, and one with a lower frequency at 0.5Hz. The article states that the original signal is fully recoverable from the collection of sampling points. How is that possible? Is it fair to say that if we want a fairly accurate description of a signal, just by linear or quadratic interpolation between sampling points, then we need to sample at, say, 8 or 10 times the highest frequency? --Ahnielsen 09:55, 11 Mar 2005 (UTC)

Hi, that would be possible if there were a Fourier series for the Dirac comb. Since that's an impossibility, You cannot use the sampling theorem on sine or cosine functions. What You can do is to approximate those infinite waves by sinc(2*pi*0.000Hz*t)*sin(2*pi*5Hz*t), which decays at infinity. For perfect reconstruction, You will have to sum from minus to plus infinity, or at least over a very large intervall around the actual argument. In general, band limited functions are defined as ${\displaystyle f(x)=\int _{-W/2}^{W/2}g(s)e^{isx}ds}$, where g is square-integrable on the interval [-W/2,W/2]. So the above corresponds to two small boxes, each of area 1/2, around +-5Hz.
If You sample by averaging and reconstruct by step functions or linear interpolation, those oversampling factors of 8,10 or even 20 are appropriate (to get error levels below 1%). Because only local operations are involved, You can as well apply this procedure to sine functions.--LutzL 12:13, 24 May 2005 (UTC)
You can't use the sampling theorem on sinusoids? This is the first time I've heard that... - Omegatron 13:48, May 24, 2005 (UTC)
I've never heard it before, either. --DavidCary 22:58, 10 November 2005 (UTC)
Yes, that fact is right. It rises scepticism in everyone I tell this, but the minimal condition for the cardinal series to converge is that the sum ${\displaystyle \sum _{k\neq o}|f(kT)/k|}$ converges, the more common, but weaker condition is that ${\displaystyle \sum _{k\in \mathbb {Z} }|f(kT)|^{2}<\infty }$, which both are not satisfied by any sinoid. See papers and books by Butzer, Higgins and Benedetto. A weaker argument is already present in the article: if one samples a sinoid with frequency F at the Nyquist-frequency 2F, then result is an oscillating series with amplitude depending on the phase. For any sinoid with frequency equal to a fraction of the Nyquist frequency, the sampled series will be periodic so the reconstruction formula will diverge since it contains the harmonic series. The proof which Shannon gave — and which was quite common before people thought that a Dirac comb were a simple thing — involves the identity of a function on an interval with its Fourier-series in the L²-sense. A Dirac-Delta-distribution is not in L².--LutzL 15:00, 24 May 2005 (UTC)
I don't know L²-type math. Can you give a more laymen's explanation? You're saying that if I have a sinusoid at f and sample at 3f (>2f to fit the theorem but an exact multiple of f), there will be more than one possible waveform that fits those sample points using the Nyquist-Shannon interpolation formula? - Omegatron 15:36, May 24, 2005 (UTC)
In most cases, piecewise continuous is a good proxy for square integrable. At least the fourier-transform has to be a function, not a distribution. And no, there won't be any cardinal series, linked here under Shannon-Nyquist interpolation formula that fits those samples since they don't satisfy the convergence criterion mentioned above. So there would be divergence everywhere exept at the sampling points. Any partial sum of the cardinal series will of course exist, and divergence is very slow with the harmonic series, so numerical experiments will not at first glance show this behavior.--LutzL 16:10, 24 May 2005 (UTC)
Sorry, I don't understand. Don't know enough about distributions. Maybe someone else does. - Omegatron 17:43, May 24, 2005 (UTC)
Heh, well, if you ask me you're needing a formal proof to show that it, indeed, will work out correctly. The proof could use some plots to make it visually more understandable.
The frequency content of your signal can only have a sum of frequencies, but not products of frequencies. So even if it looks like there's a 0.5 Hz sine modulated on a 5 Hz wave, it's just your mind reading into it. :)
Plot a 5 Hz wave through those points and you'll see that it will match up. Keep in mind that you're nearing the nyquist frequency of 5.5 Hz. If you sample at 10 Hz then if you sample at the peaks of a sin wave then you'll get alternating spikes. So as you approach the nyquist frequency, you're samples will look more and more like the alternating spikes plot.
Re your interpolation question. The ideal interpolator is the sinc function because the freq domain equivalent is a rectangular function (a perfect low-pass filter). So anything less perfect of a low-pass filter will require extra room between your max frequency and nyquist frequency. You really would not want to sample @ 11 Hz if you have a real signal of 5 Hz because you'll be hard pressed to find a low pass filter that goes from 1 to 0 between 5 Hz and 5.5 Hz.
Perhaps when you understand this, you can change to article to answer the questions/confusions you have now...or post 'em and someone can work them in. Cburnett 10:22, 11 Mar 2005 (UTC)
Many thanks for your response, Cburnett. It was not quite what I was looking for, though. Many people have participated in the discussion on the Nyquist-Shannon Sampling Theorem. Most of these people are, I suspect, from an eletrical engineering background. I am a civil engineer, and I am not concerned with electronic signals. I am concerned with the dynamic behaviour of structures. By far the most popular method of structural analysis is finite element analysis. In this method, a solid structure is described by a (high) number of elements that are connected at nodes. The response of the structure is entirely determined by the displacement of the nodes. The displacement at any other point of the structure is usually given by linear or quadratic interpolation between nodes (this is inherent in the mathematical formulation of the elements). When a structure is subjected to dynamic excitation, stress waves propagate through the structure at finite speeds. Stress is accompanied by deformation. As a rule of thumb there should be at least 8 or 10 elements per wavelenght in order to capture the dynamic response of the structure. I was wondering, given the Nyquist-Shannon Sampling Theorem, why 3 or even 2.1 elements is not enough. But I think I am answering my own question. This is because the finite element method uses something as crude as linear interpolation to recover, as it were, the "true signal" (i.e. the deformed shape) from sampling points (nodes). Any opinion on this matter would be appreciated. --Ahnielsen 13:17, 15 Mar 2005 (UTC)
Yes, it definitely depends on the interpolation method used.
Actually, I think you can use linear interpolation, as long as you filter out the harmonics afterward. For electrical engineering you use a "stairstep" zero order hold. It creates harmonics which you filter to get the original signal. [1] The first order hold (these don't have articles???) is linear interpolation, and has a different harmonic structure, but you can treat it the same. [2]
Oops. Nope, that's wrong. The first order hold is not linear interpolation, as you can see in the image above. One of my professors told me that, once, though. Hmmph. Maybe it's somehow equivalent?
So I am not sure whether you can use linear interpolation and get the same results. I've heard of people using quadratic interpolation for quick processing of audio algorithms, but I believe they had a clause saying you should really oversample 4x or so if you want it to be accurate. - Omegatron 14:44, Mar 15, 2005 (UTC)
Here's some more info: http://cnx.rice.edu/content/m10402/latest/ http://cnx.rice.edu/content/m10788/latest/ - Omegatron 15:05, Mar 15, 2005 (UTC)
I have zero-order hold and first-order hold on my user page to be created along with Pass-band, stop-band, transition band. Surprised myself WP doesn't have them. :/ Cburnett 17:10, 15 Mar 2005 (UTC)
Some do. *creates redirects*  :-) Always check under similar names... Should we just group all the holds into nth-order hold or something? - Omegatron 17:44, Mar 15, 2005 (UTC)
Ahnielsen, it's always helpful to tell what you're really asking about to get a better response. :)
The nth order holds correspond to fitting n degree polynomial to the last n values (assuming a causal signal which is usually the case). I've not done the math but I suspect an infinite order hold would be the sinc function (perfect interpolator). Since you are using a low order hold/interpolator you'll need to have a higher sampling rate. Really, what you are doing is oversampling to account for your imperfect interpolator.
In the end, the point of the sampling theorem is to contain all the energy of the signal in the frequencies under the nyquist frequency. Oversampling gives you more samples to manipulate and less error (think law of large numbers). Cburnett 17:04, 15 Mar 2005 (UTC)
The infinite-order hold is a Gaussian function according to http://cnx.rice.edu/content/m10788/latest/ Check under the "Related info" box for more on that site. - Omegatron 17:44, Mar 15, 2005 (UTC)
The http://cnx.rice.edu/content/m10788/latest/ page doesn't make sense to me. I also expected a sinc() function, not a Gaussian function.
There seem to be 2 different kinds of "first-order hold". Both agree on drawing straight lines (perhaps with some non-zero slope) between sampling instants.
At least one Wikipedian, and also the author of [3] , place that straight line between sampling instants 3 and 4 is on a straight line extrapolating a straight line from sample 2 and 3 (because it's "no fair" using point 4 -- that would be non-causal). That line intersects sample 3, but is in general nowhere near point 4. I presume these people prefer "causal" filters involving the n preceding points. (Perhaps this is the "predictive first order hold"?)
Other wikipedians, and also the author of [4] and [5] , place that straight line between sampling instants 3 and 4 so it intersects both sample 3 and sample 4, using linear interpolation. I presume these people prefer "symmetric" filters involving roughly n/2 preceding and n/2 following points.
I wonder which type of "first order hold" that Lozano used?

## Disagreement with "Important Note"

I have two complaints regarding the paragraph labelled "important note" at the articles beginning. First, while I generally agree with what is said, I think the paragraph should mention that while it is a necessary condition for sampling to be twice the bandwidth (as opposed to maximum frequency for the case of bandpass signals), it is not a sufficient condition. For example, a real signal with Fourier support over [1,10] Hz requires a 10 Hz sampling frequency (not 9 Hz!!) to prevent aliasing. As it is, the paragraph is somewhat misleading on this fact.

My second complaint is on the existence of this paragraph, even in a clarified form. Correct me if I'm wrong, but I don't believe basic Nyquist/WKS sampling ever considers such bandpass signals. It is certainly a straightforward extension from the basic theory, but I do not believe this generalization was covered in any of these 4 original papers. Since this paragraph actually deals with an extended topic which is related but not part of the article topic I am going to remove the paragraph. I suggest a new article on generalized sampling should be created and linked to. If someone does find it necessary to replace the paragraph, please at least rewrite it so it is not misleading. 8/18/2005

Hi, in some sense Your first example is wrong. That is, if You consider complex signals which can have negative frequencies. Because then the 9Hz sampling frequency is sufficient. But Your concern is right if You consider real signals with a positive frequency support of [1,10] Hz, as this is acompagnied with a negative part of the frequency support of [-10,-1] Hz. In this case, the smallest period of a nonintersecting periodization of the support is 20 Hz, and this corresponds to the lowest sufficient sampling frequency.
As to the original paper, I've put in a link to a reprint some time ago into the german version, link section. So You can find, right below the fomulation of Theorem 1 on page 2 the statement "This is a fact which is common knowledge in the communication art", and in the second column "A similar result is true if the band W does not start at zero frequency but a higher value..." and mentions the possibilty of modulation. Of course, bandwith for "sampling=taking values" should be, as mentioned above, the smallest period for which a periodization of the frequency support is disjoint (disregarding boundaries). Because by then the Fourier transform of the signal coincides on its support with its periodization. Thus one can apply the identity of a sufficiently regular, periodic function to its Fourier series, which is at the heart of the sampling theorem, as You also will find in Shannons paper.--LutzL 10:55, 19 August 2005 (UTC)
Are you sure you got those right? If a signal is real, the negative and positive frequencies will be identical, so it would seem you only have to sample one side. If a signal is complex, you'd need to sample both positive and negative frequencies, since they can be different. - Omegatron 12:49, August 19, 2005 (UTC)
But You don't sample the frequencies, You sample the signal. You can have complex signals with frequency support only in [1,10] Hz. If You sample it with 9 Hz, You get 18 real numbers (symbols) per second. If You take the real part from such a signal, it has symmetric support, but You have to sample with 20 Hz, which coincides with 20 numbers/symbols per second. The negative frequencies of a real signal have the complex conjugate amplitude of the positive ones. Are You sure You understand what sampling does?--LutzL 14:18, 19 August 2005 (UTC)
Apparently not.
• If You sample it with 9 Hz, You get 18 real numbers (symbols) per second.
You must mean a single complex number per sample? The article assumes a real-valued function. I'm certainly only used to real signals.
• Complex signals with independent spectra from [-10,-1] and [1,10] only need to be sampled at 9 Hz, but a real signal with a mirrored spectrum has to be sampled at 20? I don't get that. — Omegatron 01:59, 11 November 2005 (UTC)
Please read again my first example. It said "complex function" and "frequency support in [1,10]Hz". That means zero on [-10,-1]Hz. Then 9Hz sampling frequency is theoretically correct. And yes, the samples will be complex numbers, that is 9 complex numbers=18 real numbers per second (QAM has a related sample counting system). In reality signals are real, so one cannot have a spectrum restricted to [1,10]Hz. If there is a nonzero part of the spectrum in [1,10]Hz, there has to be a mirrored and complex conjugate part in [-10,-1]Hz. Thus they are not independend. Following the note in the undersampling section, there is no chance to choose a sampling frequency below 2*10Hz=20Hz (contrary to other situations, fct. with spectrum in [-10,-9]Hz and [9,10]Hz] can be sampled with 2*1Hz). The same holds for complex signals with independend spectra in [-10,-1] and [1,10]. They also have to be sampled at 20Hz, only that one gets 20 complex numbers=40 real numbers per second. This all under the assumption that by sampling one means taking values of the signal. With frequency multiplex methods it is possible to sample real functions with spectrum restricted to [-10,-1]Hz and [1,10]Hz with one polysample per second, the polysample consisting of 18 real numbers. Those techniques compute scalar products of the signal with model functions, technically realized by oversampling (A/D) followed by convolution="digital" filtering followed by downsampling.--LutzL 09:42, 11 November 2005 (UTC)
• It said "complex function" and "frequency support in [1,10]Hz". That means zero on [-10,-1]Hz.
Oh. Oops.
• with spectrum in [-10,-9]Hz and [9,10]Hz] can be sampled with 2*1Hz
That doesn't make sense. If [-10,-1] and [1,10] needs 20 Hz, then so would [-10,-9] and [9,10]. What's the difference between these examples?
• If you know that the signal is real, then you know that the spectrum is mirrored and complex conjugate, so it seems that the extra samples are redundant. The real signal from [-10,-1] and [1,10] has the same number of unique frequency components as the complex signal from [1,10]. Given only the positive frequency components, you can calculate all the negative ones. Maybe this isn't sampling theory, though.
• Speaking of which, which of these situations are covered in the original sampling theory? — Omegatron 15:21, 11 November 2005 (UTC)
• For the difference in the examples: See the undersampling paragraph. In the first, N=0 and only the Nyquist frequency and anything above is a sampling rate. For [9,10]Hz, N=9 and the lowest possible sampling frequency is 1Hz. This is covered in the "proof" section, although this is a real mess. And as I tried to clarify earlier, You don't have the spectrum, there is only a signal that can be sampled by measuring it. Only after that one can compute an approximate spectrum via FFT. But we are concerned with the sampling phase. To throw away the negative part means to perform a Hilbert transform, which is not very well behaved numerically. And You need the already sampled function.
• "Original sampling theory": I don't know what You understand by this term, but Shannon in the 1949 paper knew perfectly well that a general signal subspace, one that can serve as the model of a communication channel, needs only to be a function subspace that has an orthonormal basis generated by a finite number of functions per unit intervall, that is a basis of the type ${\displaystyle g_{k}(t-nT)}$, T the time unit, k=1,...,M the different types of basis functions (QAM has M=2, DVB-T has M=4096) and n varying over all integers, orthonormal meaning
${\displaystyle \langle g_{k}(t-nT),\,g_{l}(t-mT)\rangle =\delta _{k,l}\delta _{m,n}}$, (using twice the Kronecker symbol).
Look it up, he explains the geometry of signal transmission directly after the citation, the link for the citation is a reprint of his 1949 paper. In the most simple cases, as in the basis band case, the only occuring basis function, here sinc with T=1 for its normalized version, is not only orthonormal to its nT-shifts, but also interpolating, so that the scalar products for a function in the signal subspace are actually values of this function. In other cases, sampling means orthogonal projection onto the signal space by means of computing the scalar products. However, in engineering books this is represented as analog filtering followed by "point-sampling". If You are interested in this topic, You should also read the paper of M. Unser (the guy that runs www.wavelet.org): Sampling:50 years after Shannon.--LutzL 16:03, 11 November 2005 (UTC)

## Shannons original theorem

Cited from: Claude Elwood Shannon: Communication in the Presence of Noise 1949, probably circulated since 1941

 Theorem 1: If a function f(t) contains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced 1/2W seconds apart. This is a fact which is common knowledge in the communication art. The intuitive justification is that, if f(t) contains no frequencies higher than W, it cannot change to a substantially new value in a time less than one-half cycle of the highest frequency, that is, 1/2W. A mathematical proof showing that this is not only approximately, but exactly, true can be given as follows. Let F(ω) be the spectrum of f(t). Then ${\displaystyle {\begin{matrix}f(t)&=&{\frac {1}{2\pi }}\int _{-\infty }^{+\infty }F(\omega )e^{i\omega t}\,d\omega &\qquad &(2)\\&=&{\frac {1}{2\pi }}\int _{-2\pi W}^{+2\pi W}F(\omega )e^{i\omega t}\,d\omega &\qquad &(3)\end{matrix}}}$ since F(ω) is assumed zero outside the band W. If we let ${\displaystyle t={\frac {n}{2W}}\qquad (4)}$ where n is any positive or negative integer, we obtain ${\displaystyle f\left({\frac {n}{2W}}\right)={\frac {1}{2\pi }}\int _{-\infty }^{+\infty }F(\omega )e^{i\omega {\frac {n}{2W}}}\,d\omega .\qquad (5)}$ On the left are the values of f(t) at the sampling points. The integral on the right will be recognized as essentially the nth coefficient in a Fourier-series expansion of the function F(ω), taking the interval -W to +W as a fundamental period. This means that the values of the samples f(n/2W) determine the Fourier coefficients in the series expansion of F(ω). Thus they determine F(ω), since F(ω) is zero for frequencies greater than W, and for lower frequencies F(ω) is determined if its Fourier coefficients are determined. But F(ω) determines the original function f(t) completely, since a function is determined if its spectrum is known. Therefore the original samples determine the function f(t) completely. There is one and only one function whose spectrum is limited to a band W, and which passes through given values at sampling points separated 1/2W seconds apart. The function can be simply reconstructed from the samples by using a pulse of the type ${\displaystyle {\frac {\sin 2\pi Wt}{2\pi Wt}}.\qquad (6)}$ This function is unity at t=0 and zero at t=n/2W, i.e., at all other sample points. Furthermore, its spectrum is constant in the band W and zero outside. At each sample point a pulse of this type is placed whose amplitude is adjusted to equal that of the sample. The sum of these pulses is the required function, since it satisfies the conditions on the spectrum and passes through the sampled values. Mathematically, this process can be described as follows. Let xn be the nth sample. Then the function f(t) is represented by ${\displaystyle f(t)=\sum _{n=-\infty }^{\infty }x_{n}{\frac {\sin \pi (2Wt-n)}{\pi (2Wt-n)}}.\qquad (7)}$ A similar result is true if the band W does not start at zero frequency but at some higher value, and can be proved by a linear translation (corresponding physically to single-sideband modulation) of the zero-frequency case. In this case the elementary pulse is obtained from sinx/x by single-side-band modulation.

Remarks: "Band (of frequency) W" means support of the Fourier-transform F in [-2πW,2πW]. "cps" is "cycles per second".

Compare this to a paper with the same formula, but 30 years earlier (cited by Shannon): Edmund Taylor Whittaker: On the functions which are represented by the expansion of interpolation theory (1915)

Probably this could go into the article if it, despite being rather long, could count as scientific citation.--LutzL 08:59, 25 August 2005 (UTC)--minor corrections--LutzL 09:22, 26 September 2005 (UTC)

## Frequency 'Support'

What the he** is a frequency support when its at home? Never heard of this one. can someone explain??--Light current 03:06, 2 October 2005 (UTC)

Don't look too hard... Support (mathematics). Cburnett 18:04, 17 October 2005 (UTC)
In fact, it is a little more interesting. A frequency support [A,B] means, that the fourier transform of the signal has its support (mathematics) in the intervall ${\displaystyle [2\pi A,2\pi B]}$. If an engineer speaks about it, she most likely is speaking about a real valued function with a fourier-transform that is zero outside the union of the intervalls ${\displaystyle [-2\pi B,-2\pi A]\cup [2\pi A,2\pi B]}$. Now try to find out what "bandwidth" means in each of these cases.--LutzL 13:40, 18 October 2005 (UTC)
Yes, it is 2*pi*B, fourier-transform is wrt. angular frequency.
Depends which transform you're using. f implies hertz; ω implies radians per second. This article uses f where you say it means radians per second?
Can we stick to only one of f or ω? Also can we stick to one of B or W? They are both used for the same thing here, right? I'm a little confused now. — Omegatron 20:05, 19 October 2005 (UTC)
Hi, I did believe that the consens on Wikipedia was to take the angular frequency fourier transform. The majority of the articles here and textbooks uses it. I too would prefer a transform wrt. normal frequency in Hz, because it is somewhat more "natural" and needs no constant factors. But by then all articles using the fourier transform, or at least all articles related to signal processing should be consistent in this.--LutzL 08:56, 20 October 2005 (UTC)
Well, we're allowed to use j instead of i for the imaginary number in electrical articles, so I don't know why we wouldn't be allowed to use f instead of ω in predominantly signal processing articles. I like consistency, too. But using the equations most commonly used by people who actually use them is good, too. As long as everything is tied together and explained I don't see why that would be a problem. Where was this consensus reached? — Omegatron 14:33, 20 October 2005 (UTC)
Well, by changing i to j, only the symbol has changed. By changing ω to f, not only the symbol, but the underlying function changes. This is a difference. By the way, from a mathematical point of view, both f and ω are simply symbols for variables, without further qualification. Sometimes f is even a function. I found this Fortran-like allocation of names always a little confusing. Also, one quickly runs out of "virgin" symbols.--LutzL 15:52, 20 October 2005 (UTC)
So we're using Ω now? I've never seen that used for frequency before except by mistake. — Omegatron 15:15, 24 October 2005 (UTC)
Hi, please change it back to anything that seems appropriate. I only changed it because some IP had it changed to H_s, which makes no sense to me. Ω is used in many recent papers in harmonic analysis dealing with sampling and interpolation formulas and generalisations of the Poisson summation formula. The space of bandlimited functions is then called a Paley-Wiener space ${\displaystyle PW_{\Omega }}$. One could suspect that Shannon used the capital letter W only because it was at the time complicated to realize greek letters with a typewriter and the small letters look identical. However, I never saw a manuskript of this time.--LutzL 20:38, 27 October 2005 (UTC)
The above quoted passage by Shannon says "frequencies higher than W cps", though. cps is the same as Hz. — Omegatron 05:53, 28 October 2005 (UTC)
Yes, and then he goes on to use the Fourier-integral wrt. the angular frequency, adjusting the support of the Fourier-transform to ${\displaystyle [-2\pi W,2\pi W]}$. But You are right in some way, since the index in PW is usually taken as angular frequency, so sampling frequency 1 corresponds in those papers to ${\displaystyle PW_{\pi }}$. Have I already said that for me it makes no difference, as long as transforms and sybols are used consistently? --LutzL 12:57, 28 October 2005 (UTC)
Yes, but it makes a difference to our readers. — Omegatron 13:57, 28 October 2005 (UTC)
Hi, I think I found a simple way for that. I also think ${\displaystyle [f_{L},f_{H}]}$ is easier to understand than ${\displaystyle [\mathrm {A} ,\Omega ]}$, which is an old joke (or pun?) dating back to biblical times meaning begin and end (of the greek alphabet).--LutzL 09:45, 11 November 2005 (UTC)
Definitely agree about ${\displaystyle [f_{L},f_{H}]}$, though remember that f implies cyclic frequency to most people; not angular. — Omegatron 14:37, 11 November 2005 (UTC)

## Notation

What's with the := notation? I've only seen that in computer algebra systems. — Omegatron 19:33, 8 December 2005 (UTC)

Hi, that is also frequently used in math texts, meaning "left side is defined as right side". Obviously, in math texts one should avoid constructions as "n:=n+1" (of course, except in code examples), as math texts are considered as static.--LutzL 07:53, 9 December 2005 (UTC)

Hello, := is widely used in mathematics, meaning "equal by definition". Other equivalent notation: the symbol produced by LaTeX \equiv command (i.e., 3 horizontal short segments stacked), the usual equal symbol (=) with "def" overwritten. The (small) logical advantage of := is being directional. (Luca, 28 March 2008) —Preceding unsigned comment added by 81.118.90.5 (talk) 09:30, 28 March 2008 (UTC)

## Edits by User metacomet

This article is listed as a mathematical theorem. So I hope one agrees that it should be mathematically correct and using consistent mathematical notations. To the points metacomet is unwilling to accept:

• The easiest to understand should be the variants of the Fourier transform. The articel this links to defines it using the usual mathematical convention as ${\displaystyle {\hat {s}}(w)={\mathcal {F}}(s)(w):={\frac {1}{\sqrt {2\pi }}}\int _{\mathbb {R} }e^{iwt}s(t)\,dt}$. So a "frequency component" of 1 Hz would show up at ${\displaystyle w=2\pi /sec}$. So there is a need to explain that the Fourier-transform as used in signal analysis is ${\displaystyle S(f)={\mathcal {F_{N}}}(s)(f):=\int _{\mathbb {R} }e^{i(2\pi f)t}s(t)\,dt}$.
• The more controversial point is function notation. In mathematics a function is introduced as "${\displaystyle f:D\to V}$ with f(x)=...". f(x) denotes the value at the point x, nothing else. Sometimes in school mathematics or engineering the function notation is abused. To write f(x) for clarity is perhaps tolerable, but to write ${\displaystyle /mathcalF\{s(t)\}}$ translates into "the Fourier transform of the constant s(t)", which does not exist (as function). The Fourier transform of the function s at frequency f has the notation used above.

--LutzL 18:47, 21 December 2005 (UTC)

By the way, using the notation ":=" is completely non-standard. The only place I have ever seen ":=" is as the assignment operator in the Pascal programming language. In that context, the notation ":=" means "is set equal to". You seem to be using ":=" to mean "is defined as". In my experience, it is acceptable to use the plain old "=" sign, or if you really want to be careful, then to use the "equivalent to" symbol, as in ${\displaystyle A\equiv B}$.—Preceding unsigned comment added by metacomet (talkcontribs)
Do your think it's possible that the reason the preceding comment was left unsigned is maybe just maybe that I forgot to sign it? So maybe instead of trying to make me look like a fool, you could just once cut me a break.... -- Metacomet 18:54, 21 December 2005 (UTC)
This is why I really think a translation page is necessary. The equiv sign is, for mathematically oriented people, only used in connection with remainder classes or to signigy that a function is identically constant to a given value.--LutzL 18:46, 21 December 2005 (UTC)

## Response from Metacomet

I don't know where you have gotton the strange notion that s(t) is a constant, but just plain s is a function. That makes no sense whatsoever. When I see just plain s, it indicates to me that you are talking about a complex number s which is the argument of a complex valued function. On the other hand, when I see s(t), it is absolutely clear to me, with no ambiguity whatsoever, that we are talking about a function s with an argument t. And in fact, by a very common convention, the argument t, especially in the context of this particular article, is very often meant to represent a real-valued, continuous domain that engineers and scientists like to call time.

So please don't lecture me about what is correct notation and what is incorrect notation. There is no such thing as correct or incorrect notation. Notation is to a large degree a subjective matter of taste and an objective matter of effectiveness. The test is whether a given notation is easy for the reader to understand clearly and unambiguously, not whether it meets the author's view of mathematical orthodoxy.

One more thing: throughout this entire article, the frequency domain is discussed using the symbol f in units of hertz. Nowhere does the article refer to angular frequency ω in radians per second. Yet all of a sudden, you want to introduce the Fourier transform in terms of angular frequency ω instead of frequency f because you think that angular frequency is more correct than plain old frequency. I have news for you: go out in the real world and talk to people who design real signal processing systems. They deal in the world of hertz, not radians per second. Oh, and they have no trouble converting from one to the other when they need to. It isn't all that difficult to multiply or divide by 2π. On the other hand, why not simply use the frequency form of the FT instead of the angular frequency form? It is just as real, and just as valid. And in many ways, it is far easier and more intuitive than the other.

-- Metacomet 16:31, 21 December 2005 (UTC)

Agree that the article needs more from a signal processing perspective. Mathematical rigor can be a good thing, but the article should be accessible to the people who actually use the theorem, and should use their conventions as well, at least at the beginning. The more concise definitions can be later in the article, perhaps.
The article is still confusing and cluttered.
Wikipedia:Make technical articles accessible applies to mathematics, as well. — Omegatron 16:57, 21 December 2005 (UTC)
Well, I hope You could just cool down a bit. Although sampling is something that is more frequently done in signal processing then in mathematics, the sampling theorem is a purely mathematical theorem telling something about some very ideal situation involving sharply bandlimited functions that are nowhere to be found in practice. That's why it has a mathematical category in the bottom. In practical sampling You can forget about the factor-2-rule of this theorem, since it is a theoretical, ideal limit for situations where You can wait for an indefinite amount of time and perform an indefinite number of operations. I even very much doubt that factors <3 are meaningful if near perfect reconstruction is demanded.
For the Fourier transform: please check where the link leads to and how the transform is defined there. Your assumptions and foreknowledge don't matter, since this is intended to be also read by people who don't have them. If You want consistency then make an article ((Fourier transforms in signal processing)) where this other definition with its slightly different properties is to be found and link to this one.
I always thought that even in FORTRAN they marked predefined meanings for variable names beginning with certain letters as old style. So bad news for You: in any scientific or technical article the symbol s can stand for anything, an integer, a real or complex number, a matrix or a sequence or even a function, just anything. That's why well written articles explain at least informally what every symbol stands for. But Your attitude explains why so few students in computer science, even with signal processing as specialty, don't even properly understand the Fourier transform or even polynomials and convolution. They just aren't able to read common math textbooks. So I expect that "${\displaystyle {\mathcal {F}}:L^{2}(\mathbb {R} ,{\mathbb {C}})\to L^{2}(\mathbb {R} ,{\mathbb {C}})}$ is a unitary linear function" is an incomprehensible statement for You, even though it is one of the fundamental properties that make the Fourier transorm useful in signal processing.
--LutzL 17:28, 21 December 2005 (UTC)

You are unable argue your case on the merits, so you revert to attacking me personally. Instead of presenting a solid case in favor of you ideas, or even showing the flaws in my arguments, you resort to demonstrating your superior intellect by putting down my intellect. That is not a very effective approach to convincing anyone that your ideas have any merit whatsoever. -- Metacomet 17:34, 21 December 2005 (UTC)

Please skim through the discussions above this one, I don't want to repeat myself on the same page. I told You in the post directly above this section that notation as s(t) is misleading, espacially if there is no explanation what s and t are, and that there are two different (by constants) versions of the Fourier transform in common use, so one has to state clearly which one one uses. I would agree that such distinctions are confusing in the introduction, but then one should not use the FT there at all and restrict the text to some diffuse but perhaps "intuitive" phrases like "frequency component". It was no insult to You but a general observation that very few people in signal processing know what a Lp-function space is. However, without this theory one can't understand in which sense the FT is invertible, which is a very important property in signal analysis. Please read everything above, omegatron explicitely stated that he doesn't know about those spaces.--LutzL 17:52, 21 December 2005 (UTC)

First, when I have some free time, I will read through all of the discussion above, as you suggested.

Second, I am well aware that there is more than one form of the Fourier transform. In fact, there are at least four different forms in common usage. I personally think that there is way too much time and energy expended arguing endlessly and mindlessly about which form is better, and which form we should use. Frankly, who cares? Everyone makes such a big f---ing deal about what really just amounts to simple scaling factors. Is it really all that difficult to undersand the difference between frequency in hertz and angular frequency in radians per second (and sampling frequency in radians per sample for that matter)? I have a confession to make: I use all three of these conventions all the time, and I readily switch back and forth amongst them without getting confused (most of the time). Wow. Again, who cares?

Third, I do know what Lp function space is, although I only just recently learned about it in a formal setting, and I must admit that I do not understand it at any great level of depth, nor do I really see why it is all that important or relevant. But that is beside the point. I would guess that most electrical engineers and signal processing practitioners do not have a deep background in function spaces. So if you start throwing around a lot of obscure notation without clearly defining what you mean (in English, not in math jargon), then you are absolutely going to confuse people. Worse yet, you will actually lose them right from the start, because they will immediately stop reading the article.

Fourth, I agree with Omegatron that this article is confusing and cluttered. I would like to make it less confusing and less cluttered. If you want to help achieve this objective, I would welcome the help. But please, let's not bury the general reader with a bunch of esoteric notation and confusing jargon in the very first three paragraphs!

-- Metacomet 18:08, 21 December 2005 (UTC)

I nowhere claimed that the "mathematical" version of the FT should be used. Quite to the contrary, even before You began editing the article, I've put in the conversion to the "real frequency" transform in the first occurence of the FT, again please read above for earlier discussions on this topic. But, I repeat myself, if You put direcly below the link to the "mathematical" FT without any further notice a formula using the "real frequency" FT, I call that confusing. Not that the state as it is now is better readable. As I said, leave out the FT off the introduction and use such equally obscure but better to imagine phrases as "frequency component".
If You want to note functions by s(t), fine, make a note at the top telling "In this article, mathematical notation as common in communication technology is used except in places marked "mathematical formulation"" and refer to a translation page, so that also a mathematician can understand what people in signal analysis are talking about. There it should also explained how You want to refer to a function value, say at 10. Is it s(10), which is confusing, since the letter t is missing from the function symbol, or is it s(t=10) or ...?--LutzL 18:34, 21 December 2005 (UTC)

This is a waste of time. I am done. Have a nice day. -- Metacomet 18:42, 21 December 2005 (UTC)

It seems to me that you (LutzL) do not really understand the purpose of Wikipedia. But I am really tired of arguing about it with you and several others. I have had enough. If you want to make Wikipedia into a Mathematics textbook, then go for it. I am not going to stop you. I will find a better way to use my time and energy. See 'ya. -- Metacomet 19:07, 21 December 2005 (UTC)

Sorry, I had the same impression about You. You seem to regard anything published here as Your private notepad, so that notations and necessary conversions don't need to be explained. If I may tell Ya, there are people in the outside world that don't have an education in what signal processing people believe to be mathematics (As there are people that don't know math, but they wouldn't be much interested in the sampling thm). So they have perhaps problems understanding ordinary mathematics notations, and then they are confronted with a seemingly different kind of math. Goodbye, farewell and a happy new year.--LutzL 19:21, 21 December 2005 (UTC)

1. Laymen who don't know anything about anything remotely related to sampling
2. Laymen who know a little about signal processing, but not much; probably audio-related
3. Engineers
4. Mathematicians

LutzL, what percentage of each do you think exist in our readership? — Omegatron 20:17, 21 December 2005 (UTC)

## The sampling theorem for beginners

I have added a new section which contains almost no math but which hopefully describes the core of the sampling theorem in a sufficiently simple manner to be useful to someone not familiar with the concept. My suggestion is that this, or something like it, can be put somewhere in the beginning and that its content can be developed in more mathematical details later on in the article --KYN 01:20, 2 January 2006 (UTC)

Thank you for writing this section. I think you did an excellent job in explaining the concepts at an introductory level without oversimplifying things. It is also well written and presents the information in a logical manner. I hope you don't mind the edits that I made, but different people can bring additional ideas to the party, which often improves things. Again, well done. -- Metacomet 21:54, 2 January 2006 (UTC)
BTW, I agree with your assertion that a similar section at or near the beginning of many math and technical WP articles would make a major improvement in terms of accessbility and clarity. -- Metacomet 21:56, 2 January 2006 (UTC)

## Historical background

I moved this section to the end of the article since it didn't make sense to me to have it between two technical sections. --KYN 21:31, 2 January 2006 (UTC)

## Strict or non-strict inequality

There are now two versions of the condition for the Nyquist rate:

• It should be strictly larger than the largest frequency component of the sigal (intro)
• It should be larger or equal to the largest frequency component of the signal (section "Formal statement of the theorem")

My understanding of the theorem is that the second condition is not correct since it implies that the largest frequency component of the signal will in fact be aliased (as described in the "Critical frequency" section).

Is suggest that only the first version of the condition is used consistently in the article, and also that the "Critical frequency" section is removed since it only describes a special case of aliasing which is not really more interesting than any other. Also, "critical frequency" is here defined as synonomous with the Nyquist rate. Maybe move that to where the Nyquist rate is defined in the intro? --KYN 23:02, 2 January 2006 (UTC)

I agree with you on the issue of strict inequality. In practice, it really doesn't matter that much, because you will always tend to oversample at least a little bit to provide some margin of safety. Nevertheless, the article should state the theorem as a strict inequality, which is theoretically correct and avoids ambiguity. -- Metacomet 23:16, 2 January 2006 (UTC)
The section entitled "Critical frequency" is not all that significant to the article, but I am inclined not to remove it because I think it makes an interesting point, and helps justify why the sampling condition is a strict inequality. It may make sense to incorporate it within another, existing section. I am not sure it merits its own section. -- Metacomet 23:20, 2 January 2006 (UTC)
Maybe in the "Alias" section as an example? --KYN 23:38, 2 January 2006 (UTC)
I would eliminate the use of the term "critical frequency" and use the term "Nyquist rate", which is standard and far more common. -- Metacomet 23:23, 2 January 2006 (UTC)
Maybe not eliminate, since it has worked its way into wikipedia, see critical frequency. I don't use it myself, but I guess some do. Also: "Nyquist frequency" is a relatively common synonym (according to Google even more common than Nyqust rate). Maybe all three can be presented in the intro but use "Nyquist rate" throughout the rest of the article? --KYN 23:38, 2 January 2006 (UTC)
We need to be careful with regard to "Nyquist rate" and in particular "Nyquist frequency". In my experience, both terms are and should be used interchangeably. Some people, however, seem to define "Nyquist frequency" as one-half of the sampling rate whereas they define "Nyquist rate" as twice the maximium frequency component of the signal. Take a look at Nyquist rate and Nyquist frequency and you will see what I mean. Personally, I find this to be extremely confusing, and that is why I prefer to use only a single definiiton. Unfortunately, the genie is already out of the bottle, and as an encyclopedia, WP needs to report things as they are, not as they should be. So somehow we need to address these different meanings (which you have done already in the "for beginners" section) without confusing people. -- Metacomet 23:52, 2 January 2006 (UTC)

In fact, a frequency component at a frequency f has a Fourier-transform that is different from zero in some neighborhood of f, so this component has a highest frequency that is strictly larger then f. To demand that the highest frequency ${\displaystyle f_{H}}$ does not exceed ${\displaystyle f_{s}/2}$ is the same as to demand that ${\displaystyle f_{H}\leq f_{s}/2}$ is the same as to demand that there are no frequency components above ${\displaystyle f_{s}/2}$. The value at the critical frequncy is not important since first, the support of a function is always closed, and second the Fourier transform is a measurable function, that is a class of classical functions that differ only on a set of measure zero. That is, ${\displaystyle X(f_{s}/2)=0}$ is not a well-defined condition, but

${\displaystyle \int _{|f|>f_{s}/2}|X(f)|^{2}\,df=0}$

is well-defined and characterizes bandlimited functions with highest frequency ${\displaystyle f_{s}/2}$. For example, ${\displaystyle sinc(f_{s}t)}$ is bandlimited with highest frequency ${\displaystyle f_{s}/2}$, its Fourier transform is a rectangular function, and its supprot is the closed intervall ${\displaystyle [-f_{s}/2,f_{s}/2]}$.--LutzL 11:43, 10 February 2006 (UTC)

## Rearranging the order of the sections

I also think that the section entitled "Formal statement of theorem" should come much earlier in the article, perhaps following the new section called "The sampling theorem for beginners" and before the section on "Aliasing." Any thoughts? -- Metacomet 23:29, 2 January 2006 (UTC)

You know, the more I think about it, maybe some of the material in the new "for beginners" section should actually move up to the introductory section of the article (above the table of contents). Then we could combine some of the material from the current introduction, which is a bit math-heavy, with the "Formal statement" section. As currently organized, I am afraid the article might scare away non-technical or non-mathematical people before they even get to the new "for beginners" section. We could eliminate this risk by rearranging the presentation a bit. Any thoughts? -- Metacomet 23:35, 2 January 2006 (UTC)

Before starting to make too much work, I would like to have some clarifications about the relation between the Nyquist–Shannon sampling theorem and the Nyquist-Shannon interpolation formula. To me they appear to describe more or less the same thing (even more so after the new section). I guess that from a historical point of view they have developed along different paths by different people or been applied to different problems, but I don't really see the point in having two articles on the same subject. Or is it the case, that we should interpret the sampling theorem as only talking about the sampling frequency and not about how the sampling is being made or about how the signal is being reconstructed? It is possible to realize that the signal should be reconstructed by means of "sinc-interpolation" but at the same time fail to realize that this is only possible if the sampling frequency is larger than twice the bandwidth of the signal? To me, the description of the sampling process and the reconstruction process together with the relation between the sampling frequency and the signal's bandwidht form one single concept which I like to think about as the "sampling theorem". It states conditions on all of these three issues in order to accomplish perfect reconstruction. On the other hand, this is only my own mental picture of what is going on. --KYN 00:24, 3 January 2006 (UTC)

I think your mental model is perfectly fine, but I don't agree with your conclusion that they should all be lumped together into a single article. One of the great things about a web-based encyclopedia is the ability to create hyper-links between related articles. I do see the need for separate articles on each of the three topics, with appropriate links between them. Of course it is always possible to merge related articles together, or to break up sub-sections of one article into several smaller articles. The issue is to find the right balance between the length of the article versus keeping logically-related concepts together. Often, that means creating a single high-level overview article, in this case maybe Sampling, and then several related and more detailed sub-articles, such as Nyquist-Shannon sampling theorem and Nyquist-Shannon interpolation formula and Bandlimited, etc. -- Metacomet 01:16, 3 January 2006 (UTC)

BTW, if you are sure that you want to merge the articles, you should post the appropriate tag at the top of each article and allow people time to comment. There are three relevant templates: Merge, Mergeto, and Mergefrom. -- Metacomet 01:20, 3 January 2006 (UTC)

In principle I agree, but still, before even thinking about merging or not, I would like to see a statement about what is the topic of this article. It could either be
• "How do I sample a signal and then reconstruct it again" (maybe "Sampling and Reconstruction" is better) which presents the basic operations and, at sutiable places, introduces concepts such as bandlimitation, sampling frequency, etc. In this case "the sampling theorem" is interpreted in a rather general sense.
• "The Nyquist-Shannon sampling theorem" which, I guess, could focus only on the statement about the sampling frequency being larger than twice that of the signals's bandwidth. In this case "the sampling theorem" is given a more narrow interpretation.
One way to proceed is to start a new article, e.g. "Sampling and Reconstruction" which could be based on the "Beginners..." secton and extended with some more formal statements. Then there are (already) separate articles on the different concepts which relate to the sampling and reconstriction processes which can go into much more technical/mathetmatical depth. However, these must then be
• Consistent in notation
• Consistent cross-linking between the basic article and the technical articles
• As little duplication of information as possible, and instead use cross-links.
Is this a way forward? --KYN 09:01, 3 January 2006 (UTC)

## Consequences of the theorem

The topic of this section is the fact that a function (signal or not) cannot be truncated in both domains simultaneously. This is a very important fact, which sometimes have practical consequences, and it deserves to be mentioned in the wikipedia. However, I do not see how this result is a consequence of the sampling theorem. It is rather a general property of functions, even functions of arbitrary many variables. A proof of the statement could be based on the sampling theorem, but it must be possible to prove it in other ways as well. I suggest to move this section to the bandlimited article, since it relates more to that stuff than specifically to the sampling theorem. --KYN 10:01, 4 January 2006 (UTC)

I have now moved the "Consequences of the theorem" section and incorporated it into the Bandlimited article. --KYN 22:20, 7 February 2006 (UTC)

The following discussion is an archived debate of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the debate was don't move. —Nightstallion (?) Seen this already? 18:50, 23 April 2006 (UTC)

## Requested move

### Survey

Add *Support or *Oppose followed by an optional one-sentence explanation, then sign your opinion with ~~~~
• Oppose. Shouldn't have "The" in any case. Need to distinguish Fourier sampling theorem,[6] perhaps others (or explain that this one is also known by that name). Use redirects to accommodate the various combinations of the four names associated with this theorem. Gene Nygaard 13:06, 18 April 2006 (UTC)
Good point. Statistical sampling is another example. Wikipedia currently redirects sampling theory not to here, but to statistics. --Bob K 02:19, 23 April 2006 (UTC)
• Oppose per Gene. Also, my experience from music sampling is that it's known as "Nyquist sampling", which I've just created as a redirect. Regards, David Kernow 18:24, 18 April 2006 (UTC)
• On the fence, redirects cover a lot of the aliases and "Nyquist-Shannon sampling theorem" appears to be sufficiently disambiguous and most common. Cburnett 02:23, 19 April 2006 (UTC)
• Oppose. Agree with Gene; and add that I've actually used the theorem. — Arthur Rubin | (talk) 18:34, 22 April 2006 (UTC)

### Discussion

• Using &ndash;s in article titles seems ill-advised to me. Anyone else agree?  Regards, David Kernow 18:54, 18 April 2006 (UTC)
• I agree. It should be a hyphen. Gene Nygaard 15:19, 19 April 2006 (UTC)
• It seems to be the convention in WikiProject Mathematics that theorems, conjectures, etc., named after more than one person are separated by &ndash;s rather than hyphens, to disdisguish them from being named after a single hyphenated person. I don't like it, either, but it's done in a number of places. — Arthur Rubin | (talk) 16:21, 23 April 2006 (UTC)
• The article mentions two other aliases. But actually there are many more than that, according to Google. That is what bothers me about the current name... its arbitrariness. And it appears to be one of the least popular forms. The one I proposed turns out to be the most popular form, but I agree that something more specific would be better. Digital sampling theory? Waveform sampling theory? Signal sampling theory? I realize it applies to other types of functions, but Function sampling theory would be too vague. --Bob K 02:19, 23 April 2006 (UTC)
I'd say a solution is to ensure the article has a sufficiently accurate name, then have all less accurate, more everyday names redirect to it – or, if some of these less accurate names might refer to other sampling methods, have them redirect to a disambiguation page. Regards, David Kernow 11:45, 23 April 2006 (UTC)
The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

## Aliasing in Undersampled Data

It would be useful to have a section explaining where aliased frequencies appear in undersampled data. This might be a curiosity for the main body, but in the undersampling section it would be useful to explain where the baseband signal goes within the undersampled dataset.

The general case would be that frequency F appears as an alias Fa = mod(abs(Fs - F), Fs).

To pick up with the FM radio band example, if you undersample at 43.2MHz (n=0) then a station at F=94.7MHz appears at 8.3MHz. The entire band covers 1.6 - 21.6MHz.

## Mathematical basis of sampling and reconstruction theorem.

BobK, i dunno who did the previous "proof", so if it was you, i apologize in advance if this isn't flattering (i'm hoping it was someone else's contribution). but, that proof did not even come close to it. it made assertions without support and it really didn't connect its assertions to proof of the sampling and reconstruction theorem. in other words, some of its assertions were non sequitur. (an example that i can recall is that the DTFT mentioned has nothing to do with it. the DTFT or the Z transform comes after the issue of sampling is settled.) Rbj 03:22, 4 May 2006 (UTC)

It is my attempt to simplify the 18-Jan version, shown below. Don't worry about flattery. It's more important to know what you really think. I fully expect to find disagreements and irreconcilable differences here. And I don't expect to get my way, because I have limited patience for argument. It's all just temporary anyway, because whatever we do will inevitably erode under the constant bombardment from other contributors. Anyhow, I won't attempt to defend the technical correctness of "proof 1". I think this is more correctly framed as a labelling issue, rather than an issue of inclusion or exclusion. Call it "Intuitive explanation of the sampling theorem and aliasing", if you like. I think there is a need for it, because in my experience it is the most common way the theorem is introduced to newbies. It is the verbal equivalent of the frequency-domain pictorial explanation. --Bob K 12:42, 4 May 2006 (UTC)
The introduction of the proof below [labelled 18-Jan version] is not entirely correct. It takes only some minor modifications in Shannons original proof, the one that only uses mesurable, square-integrable functions, to also include aliasing etc. pp. Probably I will never understand why communication engineers, computer scientists or, beware, systems engineers feel more comfortable with distributions then with "normal" functions. To be constructive: One could make use of the Poisson summation formula to unify all proofs. And someone should explain what the difference of a sine wave of some frequency and a frequency component at that frequency is. This is quite a big error in the article where it explains the Nyquist frequency. But well, I made those points above and nothing happened.--LutzL 13:45, 4 May 2006 (UTC)
Just to make sure we are on the same page, that introduction has been gone since 18-Jan. My point was to show that although I created the version Rbj replaced on 30-Apr, it wasn't from scratch. I was just simplifying an existing article. The simplification is now copied below the 18-Jan version, for easier reference. --Bob K 22:50, 4 May 2006 (UTC)
Yes, and what is written in the recent proof in great length is plain wrong, especially the claimed identity ${\displaystyle T\sum _{n\in \mathbb {Z} }\delta _{t-nT}=\sum _{k\in \mathbb {Z} }e^{i(2\pi k/T)t}}$. Those chain of symbols has no sense. That is, if one wants it as a mathematical proof. There is no Fourier-series for the Dirac comb. One can construct approximating sequences for this distribution and those have a fourier series and their coefficients converge to 1 and this construction is used to prove the convergence of the Fourier-series (without ever mentioning distributions). But the series as written in the article does not converge in any meaningful sense. Its use does not constitute a proof. This formula is just a rule of thumb for mathemematically challenged students or profs. One may perform calculations with this rule that have a correct result. But to get there one has to obey strict rules that amount to the different statements of the Poisson summation formula or the identity of the fourier development of a periodic function to that function. Also in the latter case there are different instances of this identity for different classes of functions and different types of convergence of the Fourier series. But if one has to include all that precautions in the proof one arrives at Shannons original proof, only in a highly bloated version. I suggest to put Shannons proof in the first place and to shorten those calculatations using the rule of thumb into something readable.--LutzL 12:04, 5 May 2006 (UTC)
well, i'm not in agreement with you and am utterly unimpressed with any appeal to "the dirac delta function is not a function therefore you cannot treat it as one" sort of argument. this is a dispute of concept that many electrical engineering scholars (some that publish in IEEE as well as write textbooks) have with mathematicians. i've had courses in Real Analysis, too. i know that, from the POV of mathematicians, that when f(x) = g(x) almost everywhere (that is everywhere but a countable number of discrete values of x in the domain) that the integrals of f(x) and g(x) must be equal over the same region or set of x. this of course means that if one treats the Dirac delta function as a "function" and let f(x) = δ(x) and g(x) = 0, you have one integral that is equal to 1, yet the other integral is 0 and Lebesgue integration tells us that this cannot be so.
but it's what is done in countless textbooks on Linear System theory, all the time. and if you were to take a bunch of "mathemematically challenged students" and try to use a nuanced argument involving distributions to explain what a driving impulse is (and the impulse response) of a linear, time-invariant system, you will accomplish nothing, pedagogically. you may as well make them learn it by rote. in addition, such differentiation of the Dirac delta from the "nascent" delta functions is not useful in the context of physics/engineering/signal_processing, and is numerically unimportant. if you don't allow making a proof from the nascent definition because "the dirac delta function is not a function", then i'll pick a nascent impulse function that has an area of 1 and a width of 1 Planck time (which is a legitimate function) and there will be no measurable difference in any conceivable physical application, of which sampling a reasonably well-behaved function of time is one.
even if one were to stick to the distribution/functional definition, the Dirac delta is perfectly meaningful delayed by an arbitrary finite time as long as it eventually finds itself in an integral where the strict definition of the Dirac delta then has meaning. given that, it is most certainly meaningful and true that ${\displaystyle T\sum _{k}\delta (t-kT)=\sum _{n}e^{i(2\pi n/T)t}\ }$. If there is such a thing as a Dirac comb, it is certainly the case that it is periodic, has a Fourier series, and the Fourier coefficients are easy to get (which is where the Dirac deltas get integrated). you can count me as one of your "mathematically challenged" (i've had more math courses, including Probablity, Complex Variables, Real Analysis as well as Metric Spaces and Functional Analysis, Approximation theory, etc. than the average electrical engineer), but your appeal to "the Dirac delta isn't even a function" argument gets nowhere, fast. BTW, feel free to de-bloat it. Rbj 05:47, 6 May 2006 (UTC)
Another reason the proof is bloated is that it includes the derivation of sinc() as the inverse transform of rect(). And it's deja vu all over again in "Shannon's original proof". Why not reference the table of Fourier transforms? It would be good to have the derivation somewhere in Wikipedia, but it only needs to appear once, and this is not the logical place for it. --Bob K 01:48, 9 May 2006 (UTC)

Getting back to Rbj's objection to use of the DTFT, I have to disagree. The issue of sampling is "settled" once you state that your model is:
${\displaystyle x_{s}(t)=\delta _{T}(t)x(t)\ }$
You can quibble with the model, but the proof only claims to apply to the model, not to the actual samples. It deals with the question of recovering ${\displaystyle x(t)\,}$ from ${\displaystyle x_{s}(t)\,}$... or equivalently the question of recovering ${\displaystyle {\mathcal {F}}\{x(t)\}\,}$ from ${\displaystyle {\mathcal {F}}\{x_{s}(t)\}\,}$.   And ${\displaystyle {\mathcal {F}}\{x_{s}(t)\}\,}$ is a DTFT. --Bob K 01:24, 5 May 2006 (UTC)
i agree that the FT of ${\displaystyle x_{s}(t)=\delta _{T}(t)x(t)\ }$ is equivalent to and motivates the definition of the DTFT (with the proper normalization of ω, there is no ${\displaystyle f_{s}\ }$ or ${\displaystyle T\ }$ in the DTFT or Z transform). it's just that pedagogically (and that is my primary motivation) the DTFT is further on. the Sampling Theorem is the interface between the continuous-time domain and the discrete-time domain and it needs to be presented (and proven) without appealing to a future concepts, even the not-so-distant future concepts. This remains a concept purely in the continuous-time domain that, when conquered, allows one to go to the discrete-time domain with full confidence of how that person got there (with no ostensibly circular references). Rbj 05:47, 6 May 2006 (UTC)
Your objection to the DTFT is understood. But it is superficial, because the DTFT is just a special case of the Fourier transform. It is a notational convenience. I can make the same argument with the continuous Fourier transform. --Bob K 01:48, 9 May 2006 (UTC)

### 18-Jan version

A more modern approach to proving the theorem uses a distribution known as an infinite impulse train, or Dirac comb. Although this approach seems more complicated than Shannon's, it has a couple of important advantages: it provides additional insight into the not-exactly-bandlimited case and it explains the undersampling case.

• Consider the signal ${\displaystyle x(t)\,}$, which represents any continuous-time signal with fast decay at infinity.
• Further, consider a second continuous-time signal, ${\displaystyle q(t)\,}$, defined as a Dirac comb:
${\displaystyle q(t)=\Delta _{T}(t)=\sum _{n=-\infty }^{\infty }\delta (t-nT)}$
with Fourier transform
${\displaystyle Q(f)={\mathcal {F}}\{q(t)\}=\sum _{k=-\infty }^{\infty }\delta (f-kf_{s})}$
where ${\displaystyle f_{\mathrm {s} }={\frac {1}{T}}}$ is the sampling rate (in hertz = cycles per second).
• The result of their multiplication in the time domain is
${\displaystyle v(t)=x(t)\cdot q(t)=x(t)\cdot \sum _{n=-\infty }^{\infty }\delta (t-nT)=\sum _{n=-\infty }^{\infty }x(nT)\,\delta (t-nT)}$.
${\displaystyle V(f)=X(f)*Q(f)=X(f)*\sum _{k=-\infty }^{\infty }\delta (f-kf_{s})=\sum _{k=-\infty }^{\infty }X(f-kf_{\mathrm {s} })}$
which follows from the shifting property of the Dirac delta under convolution.
• In order to reconstruct the original signal ${\displaystyle x(t)\,}$ from its samples ${\displaystyle v(t)\,}$, we need to find a function ${\displaystyle h(t)\,}$ that we can use as an interpolation kernel. The reconstructed signal will then take the form:
${\displaystyle {\tilde {x}}(t)=v(t)*h(t)=\sum _{n=-\infty }^{\infty }x(nT)\,h(t-nT)}$.
The goal is to find a minimum set of conditions on x and h that will result in perfect and complete reconstruction. In other words, we want to have
${\displaystyle {\tilde {x}}(t)=x(t)}$
• Taking the Fourier transform of both sides, and applying the multiplication/convolution property of the Fourier transform, we have
${\displaystyle {\tilde {X}}(f)={\mathcal {F}}\{v(t)*h(t)\}=V(f)\cdot H(f)=\left[\sum _{k=-\infty }^{\infty }X(f-kf_{\mathrm {s} })\right]\cdot H(f)}$
• Thus ${\displaystyle {\tilde {X}}(f)\,}$ is the product of the periodized Fourier transform X of x and a "windowing" function H. For the sampling theorem, one wants to obtain the identity of ${\displaystyle X\,}$ and ${\displaystyle {\tilde {X}}\,}$. To simplify matters, one assumes that in the periodization sum at most one term is different from zero. That is, the replicated ${\displaystyle X(f)\,}$ shifted by integer multiples of ${\displaystyle f_{\mathrm {s} }}$ do not overlap.
• This is true, among other possibilities,
• in the standard case where X is zero outside the interval ${\displaystyle \left[-{\frac {1}{2}}f_{\mathrm {s} },{\frac {1}{2}}f_{\mathrm {s} }\right]}$ or
• in the undersampling case where X is zero outside the union
${\displaystyle \left[-{\frac {N+1}{2}}f_{\mathrm {s} },-{\frac {N}{2}}f_{\mathrm {s} }\right]\cup \left[{\frac {N}{2}}f_{\mathrm {s} },{\frac {N+1}{2}}f_{\mathrm {s} }\right]}$ for some ${\displaystyle N\in \mathbb {N} }$.
The standard case is contained in the undersampling case for N=0.
• Since the periodization of X is now identical to X, the "windowing" factor H has to be chosen as ${\displaystyle H(f)=T}$ wherever ${\displaystyle X(f)\neq 0}$.
• In the standard case, the cardinal sine function ${\displaystyle h(t)=\operatorname {sinc} _{f}\left({\frac {t}{T}}\right)}$ satisfies this condition. Since then ${\displaystyle {\tilde {X}}(f)=X(f)}$, we get the Nyquist–Shannon interpolation formula for the reconstruction of x(t):
${\displaystyle x(t)={\tilde {x}}(t)=\sum _{n=-\infty }^{\infty }x(nT)\operatorname {sinc} _{f}\left({\frac {t}{T}}-n\right)}$.
• In the undersampling case a similar formula holds with
${\displaystyle h(t)=(N+1)\operatorname {sinc} _{f}\left({\frac {(N+1)t}{T}}\right)-N\operatorname {sinc} _{f}\left({\frac {Nt}{T}}\right)}$.

If the spectrum X(f) of x(t) has its support inside some interval ${\displaystyle [-f_{\mathrm {H} },f_{\mathrm {H} }]}$, and if ${\displaystyle f_{\mathrm {s} }}$ is not sufficiently large to satisfy ${\displaystyle f_{\mathrm {s} }\geq 2f_{\mathrm {H} }}$, then the terms of the periodizing summation will overlap and aliasing will be introduced.

### simplified version (from 18-Jan to 30-Apr)

The Fourier transform of the discrete-time sequence is called the discrete-time Fourier transform (DTFT), which is periodic, with period ${\displaystyle f_{s}\,}$. As shown in the DTFT article, it can be constructed by placing copies (aka aliases) of ${\displaystyle X(f)\,}$ at intervals of ${\displaystyle f_{s}\,}$ and summing them all together:

${\displaystyle X_{dtft}(f)={1 \over T}\sum _{k=-\infty }^{\infty }X(f-kf_{\mathrm {s} })}$

Clearly, if the original and the copies are bandlimited, and ${\displaystyle f_{s}\,}$ is sufficiently large to prevent overlap, the original signal can be recovered by simply filtering out the aliases. That is the essence of the sampling theorem. There are also situations where overlap is allowed to occur, and due to the particular shape of ${\displaystyle X(f)\,}$ the aliases coincide with null regions of ${\displaystyle X(f)\,}$. Then neither the alias nor the original spectrum is irreparably affected. See Undersampling.

${\displaystyle X(f)\,}$ is proportional to the k=0 term of ${\displaystyle X_{dtft}(f)\,}$. So when ${\displaystyle f_{s}>2f_{H}\,}$ as previously discussed, we may conclude:

${\displaystyle X(f)=X_{dtft}(f)\cdot T\cdot \operatorname {rect} \left({f \over f_{s}}\right)\quad =X_{dtft}(f)\cdot T\cdot \operatorname {rect} \left(Tf\right)\,}$
where ${\displaystyle T\cdot \operatorname {rect} \left(Tf\right)\,}$ is a rectangle function, whose inverse transform is ${\displaystyle \operatorname {sinc} \left(\pi t/T\right)\equiv {\frac {\sin(\pi t/T)}{\pi t/T}}\,}$

And that essentially proves the theorem, because ${\displaystyle x(t)\,}$ can be recovered from ${\displaystyle X(f)\,}$ . E.g., the inverse transform of the product is the convolution of the inverse transforms:

 ${\displaystyle x(t)\,}$ ${\displaystyle =\left(\sum _{n=-\infty }^{\infty }x(nT)\,\delta (t-nT)\right)*\operatorname {sinc} \left(\pi t/T\right)\,}$ ${\displaystyle =\sum _{n=-\infty }^{\infty }x(nT)\cdot \operatorname {sinc} \left[\pi (t-nT)/T\right]}$

which is known as the Nyquist–Shannon interpolation formula.

### further simplified proof

Using the Fourier-transform in the normalization ${\displaystyle X(f)={\mathcal {F}}(x)=\int _{\mathbb {R} }x(t)e^{i(2\pi f)t}\,dt}$, the Poisson summation formula (PSF) can be stated for any ${\displaystyle T>0}$ as

${\displaystyle T\sum _{n\in \mathbb {Z} }x(nT)e^{-i(2\pi f)nT}=\sum _{k\in \mathbb {Z} }X(f-k/T)}$.

Further, the pi-normalized cardinal sine function is the Fourier-transform of the box-function of height 1 and support [-1/2,1/2].

After multiplication of the PSF with ${\displaystyle e^{i(2\pi f)t}}$ and integration for f over the intervall ${\displaystyle \left[-{\frac {1}{2T}},{\frac {1}{2T}}\right]}$ one gets on the left side

${\displaystyle T\sum _{n\in \mathbb {Z} }x(nT)\int _{-1/(2T)}^{-1/(2T)}e^{i(2\pi f)(t-nT)}\,df=\sum _{n\in \mathbb {Z} }x(nT)\operatorname {sinc} (t/T-n)}$.

By partitioning the inverse Fourier-transform one obtains

${\displaystyle x(t)=\int _{\mathbb {R} }X(f)e^{i(2\pi f)t}\,df=\sum _{k\in \mathbb {Z} }\int _{-1/(2T)}^{-1/(2T)}X(f-k/T)e^{i2\pi (f-k/T)t}\,df}$.

Thus the right side can be transformed into

${\displaystyle \sum _{k\in \mathbb {Z} }\int _{-1/(2T)}^{-1/(2T)}X(f-k/T)e^{i(2\pi f)t}\,df=x(t)+\sum _{k\in \mathbb {Z} ,\,k\neq 0}\int _{-1/(2T)}^{-1/(2T)}X(f-k/T)e^{i(2\pi f)t}\left(1-e^{-i(2\pi k/T)t}\right)\,df}$.

If the Fourier-transform X(f) of x(t) is zero outside the intervall [-1/(2T),1/(2T)], the identity of x(t) to its cardinal interpolation series or Nyquist–Shannon interpolation formula follows. If the support of X(f) is not contained inside this intervall, then the application of the cardinal interpolation formula results in non-vanishing alias terms.

--LutzL 11:42, 5 May 2006 (UTC)

### Short proof as proposed by BobK

Using the Fourier-transform in the normalization ${\displaystyle X(f)={\mathcal {F}}(x)=\int _{\mathbb {R} }x(t)e^{i(2\pi f)t}\,dt}$, the Poisson summation formula (PSF) can be stated for any bandlimeted function x(t) and any ${\displaystyle T>0}$ as

${\displaystyle T\sum _{n\in \mathbb {Z} }x(nT)e^{-i(2\pi f)nT}=\sum _{k\in \mathbb {Z} }X(f-k/T)}$.

The left hand side is a Fourier series, the right hand side is the periodisation of the Fourier-transform X(f) of x(t). Both sides are well-defined for bandlimited functions. Now suppose that the highest frequency ${\displaystyle f_{H}}$ of x(t) is ${\displaystyle f_{N}={\frac {1}{2T}}}$ or smaller, that is, the support of X(f) is bounded to the intervall ${\displaystyle [-f_{N},\,f_{N}]}$. After multipliation of both sides with the rectangular function ${\displaystyle \operatorname {rect} (2Tf)=\operatorname {rect} (f/f_{N})}$, one obtains

${\displaystyle T\sum _{n\in \mathbb {Z} }x(nT)e^{-i(\pi f/f_{H})n}\cdot \operatorname {rect} (f/f_{N})={\begin{cases}{\frac {1}{2}}(X(-f_{N})+X(f_{N}))&{\mbox{for }}f=\pm f_{N},\\\\X(f)&{\mbox{for }}-f_{N}f_{N}.\end{cases}}}$

Therefore, the right hand side coinsides with X(f), except possibly at the boundary points of the intervall ${\displaystyle [-f_{N},\,f_{N}]}$. For the integration of a function, values at a finite number of points (or at a set of measure zero) can be changed without influencing the integral. For the purpose of the inverse Fourier-transform this means that both sides represent X(f). Applying the inverse Fourier transform, and using the rules of the Fourier transform, one thus arrives at the Nyquist–Shannon interpolation formula.

--LutzL 10:26, 9 May 2006 (UTC)

#### discussion

I had to revert the restriction of the generality by Rbj.
okay fine. i'll post it as an alternative
Please note that ${\displaystyle f_{H}=f_{N}}$ is mathematically possible via this proof.
no. it's in error. your proof requires that there be no sinuosoids at Nyquist (when there is no problem with sinusoids at any other frequency below Nyquist). fH is the highest possible frequency component in x(t). no sinusoidal component of x(t) may be as large or larger than 1/(2T) or aliasing potentially will occur.
And please tell me any textbook that makes this restriction ${\displaystyle f_{H}.
let's start with what many people might call the DSP "bible". Oppenheim & Schafer, Discrete-Time Signal Processing. my revision is the first, (c) 1989. p. 83, Section 3.2, eq. (3.7):
${\displaystyle \Omega _{s}-\Omega _{N}>\Omega _{N}\ }$, or ${\displaystyle \Omega _{s}>2\Omega _{N}\ }$,
then there is Shanmugam Digital and Analog Communication Systems (c) 1979; Carlson Communication Systems, 3rd edition, (c) 1986 and earlier; Haykin, Communication Systems, (c) 1986. sorry about all of the old dates (i'm 5 decades old and one of those "mathematically challenged profs" or former prof), but since this is a mathematical truism, any more current text that claims that you can have any non-zero frequency components at Nyquist is simply wrong.
The argument against it in the article is somewhat beating a self-constructed strawman. The WKS-sampling theorem only ensures the perfect reconstruction of bandlimited ${\displaystyle L^{2}(\mathbb {R} )}$-functions, the function ${\displaystyle \sin(2\pi f_{N}t)=\sin(\pi t/T)}$ does not belong to that space. See my comments high above on this talk page on this topic.--LutzL 06:59, 10 May 2006 (UTC)
that's also in error. the sampling theorem as no problem with sinusoids (that result in dirac impulses in the frequency domain and certainly violate L2R) as long as their frequency is strictly below Nyquist. you also have a problem simultaneously implying time-limitedness (which is often, but not always how people get finite integrals of the time-domain signal) and frequency band-limitedness. what is necessary is frequency bandlimitedness (for sampling in the time domian). r b-j 18:16, 10 May 2006 (UTC)

• OK, I'll look the books up, if they are available to me. However, in my experience, the math in books written by engineers for engineers is often, well, questionable. There is a difference in explaining the formal handling of e.g. the Fourier transform and providing the fundamentals of Fourier analysis. The former, practically oriented part is often acceptable to well done, the latter part is missing or contains critical ommissions and factual errors. Needless to say, the WKS-sampling theorem belongs to the fundamentals part. The general sampling theory is a different story. But since it deals with approximations, never with perfect reconstructions, there is no critical frequency. I.e., applying a practical sampling method at the Nyquist frequency will always have grave errors.
• You still have to point out where the argument I'm using goes wrong. This is math, not Dogma, so everything is provable. And no, it is not possible to sample sinusoids in the way this sampling theorem does. The WKS-sampling theorem only works for "finite energy" functions, that is measurable functions that are square integrable. At least the reconstruction part fails to work for tempered distributions like a sine function (which has only as a tempered distribution a Fourier transform). Thus we are back to the topic of Fourier series expansions for tempered distributions. But I don't want discuss distributions with You, since You have still problems dealing with measurable functions and Lp-spaces.
• And no, I'm fully aware that one has to use the full real axis to reconstruct a sampled measurable function.
• Please consult Higgins: "Five short stories on the sampling theorem" and Benedetto/Zimmermann:: "Sampling operators and the Poisson summation formula". The latter is available on the net, exact references for both are in Unser: "Sampling--50 years after shannon", also available on the net. Those I consider state-of-the-art treatments of the sampling theorem/theory.--LutzL 08:38, 11 May 2006 (UTC)
• And by the way, You too confuse a sine wave with a frequency component. A frequency component has a fourier transform with an intervall around the given frequency inside its support. So a frequency component at some frequency f has a highest frequency that is larger than f. To say that there are no frequency components above f is equivalent to saying that the highest occuring frequency is f. (In fact, that there is a modification of the Fourier-transform of the signal on a set of measure zero such that the support of the modified function is bounded by f. Modify X(f) to zero if You are more comfortable with it).--LutzL 11:43, 11 May 2006 (UTC)
• Well, I went to our library and had a look at Oppenheim/Schafer, the second german edition from 1995 and the Prentice Hall reprint of the first 1975 edition. Also at some books nearby. First, You are right that O/S is among the best of those books. Second, there is no proof of any kind, only a justification which omittes all mathematical subtleties. Third: Please check, the equality is nowhere discussed. It is only stated that ${\displaystyle f_{H} is sufficient and that ${\displaystyle f_{H}>f_{N}}$ leads to aliasing, which is of course correct. In the text but not in the formulas it is further stated that one needs "at least" the Nyquist frequency, for me this is a hint that O/S believe that equality is also admissible. As to the other books: By now You know that I close a book if it starts about the Fourier series of a Dirac pulse (or worse: Dirac function). There was also at least one book that correctly stated and used that the Fourier transform of a Dirac comb (as a tempered distribution) is again a Dirac comb. But it failed to mention the Schwarz-space of the corresponding test functions. Infinite series and integrals were commuted without justification, from the equality of the Fourier coefficients of two functions it was concluded that the functions must be equal, without mentioning possible differences on a set of measure zero, and so on. You see, this is not mathematics, this is justification with hand waiving at best and voodoo in the general case. This is bad, from a certain angle. Since students tend to forget the cautious formulation in the text (if present at all) and only remember the bold and simple statements/formulas, we end up with neverending confusion.--LutzL 12:54, 11 May 2006 (UTC)
LutzL, does your O&S have a chapter called (or translated) Sampling of Continuous-Time Signals (it's chapter 3 in my book) and a section named Frequency-Domain Representation of Sampling (section 3.2 in my book)? it's there.
as indicated before, i am well aware of mathematicians objections to the usage of the dirac impulse function as done by engineers and engineering texts. i am, for the most part, unimpressed with the objections. some of it is semantics. i don't care whether or not you allow me to call a dirac delta a "function" or not. i realize that it is not a regular, normal function, but if you do not allow us to define the dirac delta to be a function of virtually zero width and integral of 1, i say "what's the point?" we want to use these concepts and functions to get something done and we know that, in the limit, the width of the nascent dirac delta (the engineering kind of definition) never quite makes it all the way to zero. as i said before, a nascent delta function with width of 1 Planck time is a legitimate function for your purposes, and, for any physical or engineering purpose is indistinguishable in effect from the dirac delta.
if you consider that the Fourier transform of the Dirac comb is a legitimate expression (another Dirac comb), that is functionally equivalent to my "bloated proof" where i use the Fourier series equivalent of the dirac comb and use the shifting property of the F.T.
lastly, there is no need for the function being sampled to be L2(R). we have the Fourier transform of a few "functions" that aren't L2(R). it needs to be sufficiently bandlimited so that no frequency component (and i am using the terminology correctly) of one image occupies the same frequencies of any component of another image (or the baseband). if that happens, you do not know what the original frequency and phase was. even in the case of a frequency component right at Nyquist, there is aliasing:
${\displaystyle x(t)={\frac {A}{\cos(\theta )}}\cos(2\pi {\frac {1}{2T}}t+\theta )}$
if you sample that at times that are multiples of T, you will get x(nT) = +A, -A, +A, -A, +A, -A, +A, -A, +A, -A no matter what θ is. but the amplitude is not A unless θ is a multiple of π. you cannot determine both the amplitude and phase of the original sinusoid. that information is lost forever and you cannot reconstruct x(t) from the samples x(nT). this is why, any decent textbook on the subject is careful to state that the sampling frequency must exceed twice the highest frequency. equality is not good enough. fH is a property of the signal and 1/(2T) is a property of the sampler. they are not the same thing.
r b-j 17:01, 11 May 2006 (UTC)
• To reiterate things I already said: Of course You can sample a sinusoid. You just cant reconstruct it from the samples. But this is also not claimed by the sampling theorem. You can also sample a sinusoid at the double Nyquist-frequency. Those values are equally well open to interpretation and don't characterize the sinusoid uniquely. You can perhaps make a lucky guess if You know that it is a single sinusoid. But also in this case, the reconstruction formula diverges almost everywhere.
• To make things worse, the (corrected) proof as You understand it, that is with equality of functions in the pointwise sense, only works for functions x(t) in ${\displaystyle L^{2}(\mathbb {R} )}$ that are bandlimited and thus continuous and have additionally a continuous and piecewise differentiable Fourier transform. For this it is necessary that ${\displaystyle (1+|t|)|x(t)|}$ is a bounded function, ${\displaystyle (1+t^{2})(|x(t)|+|x''(t)|)}$ bounded is largely sufficient. Of course no nonzero sinusoid satisfies those conditions, since periodic.
• Next reiteration: That the Dirac pulse and the Dirac comb have a Fourier-transform as tempered distributions is a triviality resp. a reformulation of the PSF for tempered test functions. The use is a question of style and time, since You would have to define the Schwartz space and its topology and to show that those linear functions on the Schwartz space are bounded. It is entirely a different beast to say that the Dirac comb equals (pointwise I expect) a highly oszillating series. Let's have a look at the partial sums:
${\displaystyle \sum _{k=-M}^{N}e^{i(2\pi f)k}={\frac {e^{i(2\pi f)(N+1)}-e^{-i(2\pi f)M}}{e^{i(2\pi f)}-1}}=e^{i(\pi f)(N-M)}\cdot {\frac {\sin(\pi (N+M+1)f)}{\sin(\pi f)}}}$
And now please tell me in which sense this converges as M and N go (independently) to infinity!
• A hint: Don't use only the box pulse as approximation of the Dirac pulse. Especially if You want discuss Fourier series. At least, You should know the Gibbs oszillations at jumps. And the Fourier coefficients are not nice. There are two parametrized sequences of Fourier coefficients that have easily computable Fourier series and that are also very convincing approximations to the Dirac pulse. Look for "Poisson kernel" and "Fejer kernel" or read the best available net source for it: Carl Offner: A little harmonic analysis (Postscript).
• But the point is: You first have to apply the approximation of the distribution to a test function. From this value, which depends on the approximation parameter, You then take the limit in the approximation parameter. This is by the topology of the Schwartz space.
• Semantics: Any distribution is also a function. But the point space in question is the space of test functions. But that is not the point of confusion. The point of confusion is this: In engineering they exchange to carelessly limit processes. It is not always possible to exchange infinite sums with integrals or integrals with limits. Especially when dealing with distributions, because most of the exchange theorems for limit processes fail systematically for them.
--LutzL 07:41, 12 May 2006 (UTC)
i am not going to argue this endlessly by use of proof by verbosity. i will say this, the sampling theorem does not require L2(R) unless you are also going to require that for the continuous Fourier Transform. but we engineers apply (in an indirect manner by indirect use of dirac delta functions) the F.T. to DC and sinusoids, none of which are L2(R). we recognize that
${\displaystyle {\mathcal {F}}\{\delta (t-\tau )\}=e^{-i2\pi \tau f}}$
and by use of the duality property
${\displaystyle {\mathcal {F}}\{e^{i2\pi f_{0}t}\}=\delta (f-f_{0})}$
similarly, for a real sinusoid
${\displaystyle {\mathcal {F}}\{{\frac {A}{\cos(\theta )}}\cos(2\pi f_{0}t+\theta )\}={\frac {A}{2\cos(\theta )}}\left(e^{-i\theta }\delta (f+f_{0})+e^{+i\theta }\delta (f-f_{0})\right)}$
no matter how you (legitimately) describe the sampling theorem, that sinusoid can be completely reconstructed from the samples
${\displaystyle {\frac {A}{\cos(\theta )}}\cos(2\pi f_{0}nT+\theta )\ }$
if and only if |f0| is strictly less than 1/(2T). if |f0| equal to 1/(2T), it cannot be reconstructed from the samples (which is the point of contension) because the samples are indistiguishable from another set with a different θ. i think we both agree that if |f0| is greater than 1/(2T), it cannot be reconstructed. but i have shown clearly a counter example to two claims of yours: 1. sinusoids can be sampled and reconstructed from the samples if their frequency is strictly less than Nyquist. 2. a sinusoid with frequency of exactly Nyquist cannot have its phase and amplitude both uniquely determined from the samples (that information is lost) and thus cannot be reconstructed from the samples.
just to be clear about strawmen, red herrings, or other distractions i am not conceding either point and i am not addressing any other point. i don't have the time. r b-j 15:47, 12 May 2006 (UTC)
Hi, I'm very confident that every calculation You do for the practical part of Your job will be correct. Unconciously, You are doing the right thing, and You describe it very clearly. You use an approximation of the Dirac delta, and thus place Yourself inside ${\displaystyle L^{2}(\mathbb {R} )}$. There You can do anything you have to do, Fourier transforms, Fourier series, sampling formulas, etc. and after You got a result from this approximate computation, You take the limit in the approximation parameter. Since this amounts in most cases to just setting this parameter to zero, You even need not be aware of performing a limit computation.
But pity for Your students, at least for those that understand the theoretical part of the Fourier transform and its connection to distributions. Because to be correct in front of You, they will have to tell things that are wrong to them.
Also, You seem to have problems in evaluating an argument. Because You insist on things that are not disputed and fail to address things where You risk to admit being wrong. As for the Fourier series of the Dirac comb and the question of the kind of convergence of this oscillating series.
Please notice that a small rectangular spike around some frequency f0 and with width e is in ${\displaystyle L^{2}(\mathbb {R} )}$, even if it is of Planck width. But its inverse Fourier transform ${\displaystyle \sin(2\pi f_{0}t)\mathrm {sinc} (et)}$ (edit: in fact the inverse transform of ${\displaystyle {\frac {1}{i2e}}(\mathrm {rect} ((f-f_{0})/e)-\mathrm {rect} (f+f_{0})/e)}$) has a maximum frequency f0 +e slightly greater than f0. To sample it, You will need at least the double of this maximum frequency which is greater than 2f0, which is what You insist in. For any ${\displaystyle f_{s}>2f_{0}}$ and ${\displaystyle 0, the interpolation formula
${\displaystyle \sin(2\pi f_{0}t)\mathrm {sinc} (et)=\ \sum _{k\in \mathbb {Z} }\sin \left(2\pi {\frac {f_{0}k}{f_{s}}}\right)\mathrm {sinc} \left({\frac {ek}{f_{s}}}\right)\mathrm {sinc} (f_{s}t-k)}$
holds. The convergence is very slow, so any finite part of that series will, for e small enough, be terribly wrong. But for e=0 it fails in the infinite series too. There is no problem in taking the values of the sine function, but no way to apply the reconstruction formula to them. (By the way, this is another topic where You didn't provide an answer. How are You going to reconstruct the sine function.)
One last remark to distributions. Since You are happy to call any function bandlimited that has a compactly supported Fourier transform in the sense of tempered distributions, You should also have no problem calling a polynomial a bandlimited function. Its Fourier transform is a linear combination of derivatives of the Dirac distribution, so its support is the point 0.

--LutzL 09:46, 16 May 2006 (UTC)--some further notational and error corrections as demanded below--LutzL 16:57, 30 May 2006 (UTC)

what's with the capital "Y"? will you stop being silly?
did you mean to say ${\displaystyle \sin(2\pi ft)\mathrm {sinc} (\pi et)\ }$ and
${\displaystyle \sin(2\pi ft)\mathrm {sinc} (\pi et)=\sum _{k=-\infty }^{+\infty }\sin(2\pi fkT)\mathrm {sinc} (\pi ekT)\mathrm {sinc} \left(\pi {\frac {t-kT}{T}}\right)}$ ?
why not use ${\displaystyle f_{0}\ }$ instead of ${\displaystyle f\ }$ so to keep consistent with convention and so we might leave the latter for the fourier integral or argument of a F.T. function of f if needed?
also ${\displaystyle f_{H}\ }$ is a property of the signal being sampled, not the Nyquist frequency which is ${\displaystyle f_{s}/2={\frac {1}{2T}}\ }$. we need to keep our notation correct and consistent with the topic of discussion.
if the rectangular spikes are centered at ±f, it's ${\displaystyle \sin(2\pi ft)\mathrm {sinc} (\pi et)\ }$ not ${\displaystyle \sin(\pi ft)\mathrm {sinc} (\pi et)\ }$. this also means that the width of the spike is e, so i think you mean to say ${\displaystyle f+e/2=1/(2T)\ }$. no? if you're trying to make a point by saying ${\displaystyle f_{H}=f_{s}/2={\frac {1}{2T}}\ }$, that will work since there is no dirac delta precisely on frequency ${\displaystyle f_{s}/2\ }$ (or above) so it doesn't matter much. my point is that ${\displaystyle \delta (f-f_{0})\ }$ is fine for any ${\displaystyle |f_{0}| (even though the inverse F.T. is not ${\displaystyle L^{2}(\mathbb {R} )}$) but is not okay for ${\displaystyle |f_{0}|=f_{s}/2\ }$ and the simple proof was made above (all i need is a counter example).
i know there are pathological functions that do not work with the sampling theorem. another is what would get reconstructed from this infinite sequence: ... -1, +1, -1, +1, -1, +1, -1, +1, -1, +1, +1, -1, +1, -1, +1, -1, +1, -1, +1 ... it doesn't converge either. the inverse F.T. of the doublet (or higher "derivatives" or the dirac delta) are among those pathological functions. it is still not the issue. i'm not the one clouding the subject. i'm also not the one appealing to pathological functions to try to make a point. i am saying that DC and sinusoids can be sampled and reconstructed in the sampling theorem (and we both know that they are not ${\displaystyle L^{2}(\mathbb {R} )}$) as long as their frequency is strictly less than Nyquist. and i shown why, if the frequency is exactly Nyquist, that there is information lost (aliased) that prevents reconstruction. again, those are the only two points i am making and i refuse to be distracted from it.
a long while ago, on the pertinent USENET newsgroup comp.dsp i had a drawn out debate [7] [8] [9] with a person named "Jay Rabkin" about the nature of the sampling and reconstruction and of the dirac impulse and he kept appealing to the "distribution" definition of the dirac delta as grounds to make his point which was hard to know what point he was trying to make other than that i could not sensibly claim that the dimension of quantity of the dirac delta function, such as it is, is the reciprocal dimension of the argument. i fear that this sorta appeal to esoterism to obscure the simpler reality what you're doing. you "pity [My] students" (i'm not teaching at the moment), for fear that some esoteric nature of the dirac distribution will mess them up with the sampling theorem. this is telling. you have absolutely no concept of pedagogy in an electrical engineering curriculum. we have trouble getting these kids to care about region of convergence of the Laplace transform or Z transform. they rarely will take a formal Real Analysis course in the math department (you know, the "given an epsilon, find a delta" so that some thing is proved) or will ever learn of Lebesque integration or the difference between countably infinite and uncountably infinite. you try to describe the Dirac delta in terms of "distribution" (instead of as a thin limit of the nascent delta functions with unit area) and their eyes will glaze over immediately and you will end up actually conveying no knowledge to them. this is why the engineering texts (in Linear System Theory) do it the way they do. even if it makes the mathematicians blanch.
so please make your point concisely and use the conventions for notation common to the subject (as in the article). again, i spent far more time on this silly argument than i had expected to. r b-j 05:21, 17 May 2006 (UTC)
OK, my last comment should be better for your conventions now. So I have fortunately not to take pity on your actual students, but only past event pity on Jay Rabkin. The point both sides were missing is, if t has a dimension, this should be factored out in the argument of the "dirac function". More precisely, you should always define what you mean by a specific dirac process. Something like ${\displaystyle \int _{\mathbb {R} }f(t)\delta (t/T-k)\,dt/T=f(kT)}$ would be appropriate. -- If You can't teach your students what a continuous function is (e.g., something for which a constant function is a good approximation) then they won't understand the concept of a nascent dirac pulse. And you even don't need it. Sampling gets You a sequence of pairs of time and value, reconstruction restores a function, optimization theory tells you how to best combine both. -- BTW., it was not Dirac that came up with formalized distributions but Laurent Schwartz. And he wrote extensive textbooks about it that are still readable today. Not that I have any hope that an old silly as you will read them. -- I don't expect you to do fundamental research or teach your students more than they need for their practical calculations. But i miss your professional interest into the fundamentals of the things you teach. The doubt on the common sense and fishy formulations in textbooks. A real analysis course in the students days of yours is not enough for that. -- PS: You still need to indicate how you wish to reconstruct a sum of sine functions from their samples, not approximatively but in a very theoretical sense with infinite precision.--LutzL 16:57, 30 May 2006 (UTC)
i hadn't seen this addition when it was made (possible because i was out due to holiday and many other edits to this page where made since). there are still other format mistakes that i will try to fix.

### Proof using PSF

Defining

${\displaystyle X(f)\ {\stackrel {\mathrm {def} }{=}}\ {\mathcal {F}}\{x(t)\}\ {\stackrel {\mathrm {def} }{=}}\ \int _{-\infty }^{+\infty }x(t)e^{-i2\pi ft}\,dt}$.

The Poisson summation formula (PSF) says

${\displaystyle \sum _{n=-\infty }^{+\infty }X(nf_{0})e^{i2\pi nf_{0}t}={\frac {1}{f_{0}}}\sum _{k=-\infty }^{+\infty }x\left(t-{\frac {k}{f_{0}}}\right)}$

and using the duality property of the Fourier transform, the PSF can be restated as

${\displaystyle T\sum _{n=-\infty }^{+\infty }x(nT)e^{-i2\pi nTf}=\sum _{k=-\infty }^{+\infty }X\left(f-{\frac {k}{T}}\right)}$ .

Given the condition that the highest frequency of x(t) is ${\displaystyle B<{\frac {1}{2T}}}$, that is, the support of X(f) is bounded to the interval ${\displaystyle [-B,\,B]}$, and multiplying of both sides with the rectangular function ${\displaystyle \operatorname {rect} (Tf)}$, one obtains

${\displaystyle T\sum _{n=-\infty }^{+\infty }x(nT)e^{-i2\pi nTf}\cdot \operatorname {rect} (Tf)=\sum _{k=-\infty }^{+\infty }X\left(f-{\frac {k}{T}}\right)\cdot \operatorname {rect} (Tf)=X\left(f-{\frac {0}{T}}\right)}$

because only the term for k=0 survives the rectangular windowing operation. This means

${\displaystyle T\sum _{n=-\infty }^{+\infty }x(nT)e^{-i2\pi nTf}\cdot \operatorname {rect} (Tf)=X(f)\quad \forall f\ }$

and that, given the bandlimit and sampling condition above ${\displaystyle B<{\frac {1}{2T}}}$, the samples of x(t), or x(nT) are sufficient to describe fully the spectrum of x(t) and knowing that the continuous Fourier transform is a fully invertable operator, the samples of x(t), or x(nT) are sufficient to fully describe x(t).

Reconstructing x(t) requires the inverse Fourier Transform:

${\displaystyle x(t)={\mathcal {F}}^{-1}\{X(f)\}=\int _{-\infty }^{+\infty }X(f)e^{+i2\pi ft}\,df}$.
${\displaystyle x(t)={\mathcal {F}}^{-1}\left\{\sum _{n=-\infty }^{+\infty }x(nT)e^{-i2\pi nTf}\cdot T\operatorname {rect} (Tf)\right\}}$
${\displaystyle =\sum _{n=-\infty }^{+\infty }x(nT)\cdot {\mathcal {F}}^{-1}\{T\operatorname {rect} (Tf)e^{-i2\pi nTf}\}}$
${\displaystyle =\sum _{n=-\infty }^{+\infty }x(nT)\cdot \operatorname {sinc} \left({\frac {t-nT}{T}}\right)}$

where ${\displaystyle \operatorname {sinc} (u)}$ is the normalized sinc function. This is the Nyquist–Shannon interpolation formula.

this is how to use the PSF to (quickly) prove the sampling and reconstruction theorem. but no matter how you do it, the sampling frequency must be strictly greater than twice the highest frequency of the baseband signal being sampled. that is fundamental. r b-j 18:43, 10 May 2006 (UTC)

## BobK, can you demonstrate how nearly any of the edits made in the past week or two...

...either reduce bloat, or make the mathematical justification of the sampling theorem more straight-forward? the article is getting bigger and less concise. it is getting less clear. these edits, overall, are not helping. r b-j 04:58, 26 May 2006 (UTC)

An example of bloat is deriving the inverse transform of rect(), which can be looked up in a table. Each proof that I did is concise. It is not bloat when there are several interesting perspectives that arrive at the same place. Nobody is forced to read them all. And it avoids the hassle of trying to agree on just one of them. But if you would like me to pick just one, I can do that too. --Bob K 03:17, 30 May 2006 (UTC)
Since the Dirac comb is not actually what sampling is (not even close), I think it is blatently not straightforward to use it as the definition of sampling. You can start with the concept of a periodic extension of ${\displaystyle X(f)\,}$ without claiming anything about its relationship to sampling. The Dirac comb then emerges in a more mathematical (i.e. "straightforward"), way. --Bob K 17:30, 30 May 2006 (UTC)
i am not sure exactly what to do about this "mathematical basis ..." section. do we include Shannon's original proof? why not Wittaker's? what was Nyquist's mathematical treatment? that's historical stuff. the proof by use of PSF is sufficient mathematically (and brings no objections from the mathematicians from our bonehead engineering usage of the dirac delta or dirac comb), but might seem a little indirect. i think there needs to be a reasonably rigorous (from an engineers POV) and straight forward proof that applies the concepts pretty much in the pedagogical chronological order:
1. we have a continuous-time signal.
2. we uniformly sample that signal. how do we represent that sampling process mathematically in a simple and effective manner? is there another means or a better means other than multiplying by a dirac comb?
Certainly. Shannon's proof for example. The actual samples are the finite-valued, discrete-time sequence, ${\displaystyle x(nT)\,}$, not the infinite-valued ${\displaystyle x(nT)\cdot \delta (t-nT)\,}$. --Bob K 14:17, 30 May 2006 (UTC)
3. using whatever means of sampling, we need to show that this sampling process actually samples the continuous-time signal - that it actually discards the information in the c.t. signal between the sampling instances.
Done. ${\displaystyle x(nT)\,}$ is just one [real or complex] number. Everything nearby is necessarily discarded. --Bob K 14:17, 30 May 2006 (UTC)
4. then we need to show that, somehow from that sampled signal representation or just the discrete sequence of samples, that somehow the original c.t. function can be reconstructed from only those samples and under what condition(s) that reconstruction is guaranteed to be accurate. now, i think the only way to do that from the dirac sampled function, is with a frequency-domain approach. that is to show that multiplying the original c.t. function by the dirac comb causes this repeating and overlapping of the spectrum and that, if there is no overlapping, that LPFing the result gets your original spectrum back.
The discrete-time sequence is not a Dirac comb. Unlike a comb, it is physically realizable. The comb is a questionable attempt to model the discrete-time sequence as a continuous-time function so that we can use the continuous-time transform, only because it is comfortable territory for most readers. As I showed, ${\displaystyle x(nT)\cdot \delta (t-nT)\,}$ can also be viewed as a mathematical result, rather than a [flawed] premise. But Shannon showed that the ${\displaystyle x(nT)\,}$ samples are also the coefficients of a Fourier series expansion that, under the stated conditions, fully represent ${\displaystyle X(f)\,}$ and therefore fully represent ${\displaystyle x(t)\,}$. No delta functions are needed. --Bob K 14:17, 30 May 2006 (UTC)
that's my take on this issue. r b-j 06:51, 30 May 2006 (UTC)
Sounds good.
ad 1.) full qualification: continuous, finite-energy, continuous-time signal.
ad 2.) more emphasis on general shift invariance, not only in sampling but also in reconstructing. As for the sampling: ${\displaystyle S_{T}:L^{2}(\mathbb {R} ,\mathbb {C} )\cap C(\mathbb {R} ,\mathbb {C} )\to \ell ^{2}(\mathbb {Z} ,\mathbb {C} )}$ defined by ${\displaystyle S_{T}(x)=\{x(nT)\}}$ does the trick as well and requires no play with tempered distributions. On the other hand, no objection to using a Dirac comb, only the notation is a little misleading. (In mathematics, one usually writes δx for the Dirac distribution centered at x, δx(φ):=φ(x) for all test functions (continuous at least).
ad 3.) unclear. The sampling theorem is there to show that no information is lost between the samples. Or the remark is too simple for me to understand.
ad 4.) resp. ad PSF): While not reasonably clear from the PSF article here (I corrected this in deutsch), the PSF contains the statement about the Dirac comb to be its own Fourier-transform. But as a tempered distribution, the Dirac comb approach is only applicable to Schwartz-test functions. As long as You are comfortable with the situation where the spectrum has compact support and has bounded derivatives of any order, You can freely compute with this formalism. Spectrum repetition is ok for Schwartz test functions since all series converge in a locally uniform manner. Problems arise if the spectrum is less than a Schwartz test function.
ad 4.) resp. ad overlap): for a continuous spectrum function, the values at the highest frequency are zero, so spectrum repetition is pointwise exact inside the frequency band even for the critical frequency. But this is theoretical. Sticking to the best eng. books one can formulate that sampling at strictly higher than critical is ok, strictly lower than critical fails (visible overlap), equality is for the experts. But I hope that You can sample the sinc-function at sampling frequency 1.
ad Nyquist: perhaps one should include a word or two about the reconstruction of periodic, frequency bounded functions. This amounts to some kind of polynomial interpolation.

--LutzL 08:36, 30 May 2006 (UTC)

## Concise summary of sampling and the Fourier transform

• That special case of the continuous-time Fourier transform is called discrete-time Fourier transform (DTFT), which is a periodic function. But it is also a continuous-frequency function, which means that a computer cannot evaluate it at every frequency (because it is a continuum).

• The continuous-time Fourier transform of ${\displaystyle x_{s}(t)\,}$ (or ${\displaystyle X_{s}(f)\,}$) is essentially the discrete-time Fourier transform (DTFT) of ${\displaystyle x[n]\,}$, which is a periodic function. But, because ${\displaystyle x[n]\,}$ is infinitely long, it is also a continuous-frequency function, which means that a computer cannot evaluate it at every possible frequency (because it is a continuum).

The main problem is that the continuous-frequency nature of:

${\displaystyle X_{s}(f)={\frac {1}{T}}\sum _{n=-\infty }^{\infty }X(f-nf_{s})\ }$

has nothing to do with the duration of ${\displaystyle x[n]\,}$. It only depends on the continuous-frequency nature of ${\displaystyle X(f)\,}$, which depends on the periodicity (or lack thereof) of ${\displaystyle x(t)\,}$.

perhaps it should be reworded to be if x[n] is completely general and infinitely long. that condition means that, in general, Xs(f) is contiuous. if x[n] is finite in length, then there are issues about how to define or extend x[n] outside of its original length. in and of itself, a finite and discrete set of frequency-domain values (those returned by the DFT) suffice to completely describe x[n], but that essentially makes one of a couple of assumptions. one is that the DFT is sampling the DTFT, a continuous but repeating function, at equally spaced values and then the remainder of x[n] must be defined (usually zero). the other is that the DFT is only a mapping from one discrete vector to another discrete vector (with no mention of the DTFT or anything continuous). the latter invites debate about the periodic nature (or lack of, for those on the other side of the debate) inherent to the DFT but says nothing to the issue you bring here. the former says that, even if nearly all of the values of x[n] are zero, they are still defined for the countably infinite set.
I don't get the point of all this. My original statement references the DTFT article, and extracts a couple of concise, non-controversial points (periodicity and continuous-frequency). --Bob K 12:20, 20 June 2006 (UTC)
it really is the case that because x[n] is infinitely long (in definition) and assumed general means that the DTFT is continuous in the frequency domain. even if x[n] was periodic, the DTFT would be composed of dirac impulses with zero in between, so it's still a continuous-frequency function. because x[n] is discrete is the reason that the DTFT is periodic with period 2π.
I understand all of that. Let's not get drawn into a quibble over how to describe a Dirac comb. No doubt we agree that for mathematical purposes it is a "function of a continuous variable". And for equivalence purposes, it is like a function of a discrete variable. I.e., it can be "fully-represented" by a discrete sequence (no loss of information) and vice-versa. --Bob K 12:20, 20 June 2006 (UTC)
then does this mean that you agree with me that "... because ${\displaystyle x[n]\,}$ is infinitely long, [ ${\displaystyle X_{s}(f)\,}$ ] is also a continuous-frequency function..."?r b-j 15:57, 20 June 2006 (UTC)
Let's make sure we don't have a semantics issue. ${\displaystyle X_{s}(f)\,}$ is clearly a function of a continuous variable (by definition). So I assume you are referring to whether that function is a Dirac comb or a "normal" function. Also, ${\displaystyle x(t)\,}$ and ${\displaystyle x[n]\,}$ are "infinitely long" (counting zero-valued samples), by definition. So I assume you are implying that the non-zero values extend to infinity. Between you and me, I suggest we refer to that concept as non-windowed, for lack of a better term. Anyhow, my answer is the same as before. "${\displaystyle x[n]\,}$ is infinitely long" is neither a necessary nor sufficient condition to make ${\displaystyle X_{s}(f)\,}$ "continuous-frequency". Rather it is a necessary (but not sufficient) condition to make ${\displaystyle X_{s}(f)\,}$ a Dirac comb (the very antithesis of continuous frequency). --Bob K 17:23, 20 June 2006 (UTC)

I would also add that the word essentially is unnecessary (at best), perhaps misleading (at worst). The DTFT is exactly what the continuous-time Fourier transform of a modulated Dirac comb reduces to mathematically. It is a special case of the more general transform.

the word "essentially' was put in because Xs(f) is dependent on T and there is no T (or x(nT)) in the DTFT, only x[n] which is what is left when any sense of physical time t is removed from the problem. the output of the DTFT looks like X(eiω) and is closely related to Xs(f), but is not the same function. when a digital filter is executing its algorithm, there is no t or T, only x[n], x[n-1], etc. likewise, with the DTFT there is only x[n] in the "time" domain and only X(eiω) in the frequency domain. r b-j 02:10, 20 June 2006 (UTC)
But all we're really talking about is frequency normalization (by ${\displaystyle f_{s}\,}$) or not. Other than that superficial difference, ${\displaystyle X_{s}(f)\,}$, or ${\displaystyle X_{s}({\omega \over 2\pi })\,}$ if you wish, is a DTFT (by mathematical equivalence). And that point is clearly made in the DTFT article. Since this is intended as a "concise summary", my opinion is that it's potentially more confusing than helpful. --Bob K 12:20, 20 June 2006 (UTC)

DTFT:

${\displaystyle X(e^{i\omega })=\sum _{k=-\infty }^{\infty }x[k]\,e^{-i\omega k}\,}$ (same as X(z) in double-sided Z transform which is nice.)

FT of sampled function xs(t) (using the conventional scaling that leaves out T):

${\displaystyle X_{s}(f)=\sum _{k=-\infty }^{\infty }x(kT)\,e^{-i2\pi fkT}\ }$
${\displaystyle =\sum _{k=-\infty }^{\infty }x[k]\,e^{-i2\pi fkT}\ }$
${\displaystyle ={\frac {1}{T}}\sum _{n=-\infty }^{\infty }X(f-nf_{s})\ }$ (but not the same function, X(.) as above)

The relationship between the two is:

${\displaystyle X_{s}(f)=X(e^{i2\pi fT})\,}$

essentially the same function, but not "exactly" the same function. r b-j 15:57, 20 June 2006 (UTC)

Same function;   i.e. plot them on the same f-axis. The notational difference is superficial and serves no purpose here [my opinion]. But now I know (for the first time) that we fundamentally agree. Yay! --Bob K 18:46, 20 June 2006 (UTC)
actually, Bob, i don't think we agreed specifically about the correct word. you said "exactly ", which i said is not strictly true, and i said "essentially" which you said is "misleading". i continue to stand by my adverb as being more correct. r b-j 02:44, 22 June 2006 (UTC)
By fundamentally agree, I mean: ${\displaystyle X_{s}(f)=X(e^{i2\pi fT})\,}$. The equals sign means exactly equal, so it's just two different notations for the same function of independent variable, ${\displaystyle f\,}$. I wasn't sure if you were getting that or not. Thus I was confused/misled by the word essentially. --Bob K 11:55, 22 June 2006 (UTC)

You also replaced this paragraph:

• The discrete [frequency] Fourier transform (DFT) is a formula for computing regularly-spaced values (i.e. at discrete frequencies) of the DTFT function. Because of DTFT periodicity, a finite number of values is sufficient to fully characterize the DFT.

• The discrete [frequency] Fourier transform (DFT) is equivalent to evaluating (or sampling) the DTFT at regularly-spaced discrete frequencies. Because of DTFT periodicity, a finite number of values is sufficient to fully characterize the DFT. Because of this sampling in the frequency domain, this has the effect of periodically extending the data ${\displaystyle x[n]\,}$ in the time domain (in the same way that sampling in the time domain periodically extends the spectrum in the frequency domain).

Instead of that last sentence (which I would prefer to leave out), I would say that when ${\displaystyle x(t)\,}$ happens to be a periodic function, then ${\displaystyle X(f)\,}$ and ${\displaystyle X_{s}(f)\,}$ are effectively discrete-frequency functions. In that special case, it is possible for the DFT to fully represent the DTFT. And the inverse DFT reproduces the [periodic] samples of ${\displaystyle x(t)\,}$. Conversely, when ${\displaystyle x(t)\,}$ is not periodic, e.g. ${\displaystyle \mathrm {rect} (t)\,}$, the DTFT is a continuous-frequency function, and the DFT cannot fully represent it. The inverse DFT still produces a periodic sequence, but it is not the [aperiodic] x(nT) sequence. When a function of a continuous variable is approximated by a function of a discrete variable in one domain, the manifestation of the approximation error in the other domain is periodicity. --Bob K 12:10, 16 June 2006 (UTC)

i'm still decoding what you're saying as an alternate, but i continue to stand by that sentence for both accuracy and conciseness. essentially, sampling in one domain always causes a periodic extension in the other domain. and, since the Fourier transform is invertible, the converse is true.
BTW, i am not sure whether or not i agree with Dicklyon's perspective on this, but i do think it should be discussed. probably this best belongs in DTFT or similar articles. the sampling theorem does not need the DTFT to support its thesis. it just so happens that there is a clean and linear relationship between the output of the DTFT, X(eiω), and Xs(f). mentioning this as a note (with link to DTFT) makes sense, but i wouldn't build much about the sampling theorem on it.
actually, i think we should include the proof using the PSF (way above), to satisfy mathematicians who don't like our use of the dirac delta function in the mathematical treatement. we could include it as an alternative proof. r b-j 02:10, 20 June 2006 (UTC)

### Simple Solution

None of this rambling discussion of properties of various Fourier transforms has anything to do with a concise statement of the sampling theorem. So I took it out. Dicklyon 14:51, 16 June 2006 (UTC)

one thing, Dick. it is customary to include a statement or rationale when deleting a large section of content (particularly content added recently by someone else). to delete a section of content added recently by someone else, along with other "innocuous" edits and then cite only the innocuous edits in the edit summary, might be considered to be a little shady. like either 1. you're so dismissive of the added section you think it deserves no mention to delete it or 2. you're trying to sneak it out and see if no one notices. maybe you just forgot to mention it (a legit oversight). r b-j 02:10, 20 June 2006 (UTC)

You better remind me what section you're referring to. I'm not seeing it. Dicklyon 05:28, 20 June 2006 (UTC)

diff: [10] r b-j 15:57, 20 June 2006 (UTC)

OK, I confess. I don't recall if I deleted the "Note about scaling" section on purpose, or whether it might have been accidental.

??? ${\displaystyle \quad :-/\ }$

In truth, I don't remember ever reading that section. I can't say that I particulary care for its cryptic contents, so its possible I just took it out to see if anyone cared about it. It seems to be quite irrelevant to the topic at hand. Dicklyon 20:01, 20 June 2006 (UTC)

perhaps you can begin by asking yourself how you would design and construct an approximation to a brickwall filter (that title deserves an article) with cutoff fs/2 and with passband gain of T= 1/fs. T is not dimensionless.
or you can ask yourself how to model a practical conventional DAC, or why, in a simple DSP system, (you know those DSKs that TI and ADI sell with ADC and DAC built in) where the ADC and (conventional) DAC have identical Vref (so their F.S. voltage is the same). program the DSP to simply pass the number from the ADC to the DAC ("talkthru.asm" is a common filename) and then use a scope and signal generator to measure the frequency response. besides some constant delay, if the DAC is conventional (not sigma-delta), you will notice a nearly 4 dB drop in gain as your frequency gets close to Nyquist.
this is treated in zero-order hold, but you will notice a difference in scaling factor for xs(t) compared to this article (and 95% of the communications/DSP texts out there). why? that is what the cryptic contents of that Note on scaling are about. r b-j 20:50, 20 June 2006 (UTC)

Yes, I can see what it is talking about. But crude reconstruction filters like zero-order hold have very little to do with the sampling theorem. This whole section is a narrow aside on approximate reconstruction techniques, I think. I don't see how it helps one understand the sampling theorem or its applications.

no one is building "crude reconstruction filters like zero-order hold". the behavior of the ZOH is what you get stuck with when you "reconstruct" with a conventional DAC. whether you like it or not or think it is crude or not, any practical reconstruction (and this article is about reconstruction but presents only a hypothetical and impractical method of reconstruction) will not be done with a string of dirac impulses. the discussion of the operation of the ZOH really belongs in an article about ZOH, but the article (as well as the FOH article) begins with a slightly modified premise regarding the sampled signal, xs(t). where is this difference in premise to be bridged? r b-j 22:37, 20 June 2006 (UTC)
Interesting viewpoint. I thought this article was about the sampling theorem, which totally requires sinc functions for reconstruction, rather than about approximate reconstruction methods, about which the sampling theorem has little to say. Dicklyon 23:01, 20 June 2006 (UTC)

POLL: Does anyone here think the "Note about scaling" subsection is useful? Dicklyon 21:41, 20 June 2006 (UTC)

at least he's asking questions first before shooting. r b-j 22:37, 20 June 2006 (UTC)
• No. Just define:   ${\displaystyle \Delta _{T}(t)\equiv T\cdot \sum _{n=-\infty }^{\infty }\delta (t-nT)\ }$   to begin with and be done with it. --Bob K 23:16, 20 June 2006 (UTC)
if i were king of the world, that's what i would do. but the problem is that except for Pohlmann, Principles of Digital Audio (a book with some of my influence in it BTW), no textbook that i know of scales the sampling operator that way (that, along with the conventional DFT scaling, has always disappointed me). they always put the T factor either in the passband gain of the brickwall reconstruction filter or put it as a separate gain block. if you put it in ΔT, then when you use that model to describe the DTFT (that needs work there, but i don't want to think about it now) or the Z transform, you'll have this extraneous T factor to deal with there and i don't think there should be two different definitions for the sampling operator. i really think we need to stick with the prevalent convention in the books, even if a better convention can be thunk of. r b-j 23:34, 20 June 2006 (UTC)
The separate gain block is fine with me too. So I still vote "no" to the "Note about scaling" section. --Bob K 15:12, 21 June 2006 (UTC)
please illustrate exactly how (with the "separate gain block") you will explain the zero-order hold effect of a conventional DAC without a resulting scaling error of of 1/T in the result. (the frequency response has to be dimensionless and the gain at DC is unity or 0 dB.) r b-j 15:37, 21 June 2006 (UTC)
It's you who said that a separate gain block is a "prevalent convention". If "the books" aren't concerned with the scaling error you're worried about, why should I be? --Bob K 22:29, 21 June 2006 (UTC)
no, Bob. i said two things. that the
${\displaystyle x_{s}(t)=\Delta _{T}(t)x(t)=\sum _{n=-\infty }^{\infty }x(nT)\delta (t-nT)\ }$
is the "prevalent convention" and that there are two ways (that i know of) of dealing with that, either "put the T factor in the passband gain of the brickwall reconstruction filter or put it as a separate gain block." of the two, i believe the former is much more common. but let's assume the latter for the purpose of imagining a way to do this:
i came up with a means of representing the sampling theorem that is comparible to EE texts (except they mostly convolve with the Dirac comb in the frequency domain to get the repeated images where i use a simpler and more familiar property of the cont. Fourier transform) and added a note about scaling to bridge between the placement of the T factor that is common in texts and represented in the main section to where it is placed in zero-order hold and first-order hold (and i noticed where you do it in DTFT, which, BTW has an inconsistent definition of Δ(t) to this article and Dirac comb). also, i didn't put in this Note on scaling until there was a conceivable need to because of the difference in scaling between the common convention and that in the ZOH and FOH articles.
now, i did that and it satisfies the common convention (at first) and it explains an alternate convention that really is necessary (until someone comes up with alternative that works just as well) to explain how to begin to approach this other practical reconstruction (than brickwall filtering a sequence of dirac impulses). that is the best way i can think of to do it. if you want to use the "separate gain block" but otherwise common convention, Bob, to explain why the conventional ideal DAC has a ZOH frequency and impulse response, the impetus is on you to do that. not me. r b-j 02:37, 22 June 2006 (UTC)
Yes, if that was something I cared about, and if I felt that this article was the appropriate place for it, then I guess the impetus would be mine. But neither condition is true. Anyhow, since the separate gain block is not a prevalent convention, then I am back to my first stance:   Just sample with   ${\displaystyle T\cdot \sum _{n=-\infty }^{\infty }\delta (t-nT)\ }$. If you don't want to call it ${\displaystyle \Delta _{T}(t)\,}$, then give it another name. --Bob K 05:34, 22 June 2006 (UTC)
okay. someone can always revert it if they hate it. if you want to change it that way, it is fine with me, but i was conceding to convention. someone who knows me from comp.dsp is gonna think i did it because stood on a soapbox and advocating changing the lit. to that way. dunno if some people are gonna think that the common convention is the Wikipedia way. (i wouldn't change the definition of ΔT(t) but just put a T beside it in "xs(t) = ...") r b-j 06:26, 22 June 2006 (UTC)
As I told you before, rbj, and there was no reply from you, if you want the continuous variable to carry a dimension you have also to care that the input of the "delta function" is dimensionless. That's because this "function" is purely theoretical and in its theory defined to be dimensionless on both sides, input and output. In the theoretical arguments of this article, it is assumend that all occuring variables are dimensionless, so there is no problem. If one is to apply the theory to practical data with dimensions, then those practical data have to be divided by a reference scale, thus removing the dimension. In most cases this reference scale is 1s or some other unit scale, so this removal of dimension does not show up in the decimals of the data, which seems to cause some of your confusion.--LutzL 07:03, 21 June 2006 (UTC)
i am not confused at all. in engineering practice (as well as physics) we deal with quantities and functions that are dimensionful all of the time. certainly when an argument is applied to a mathematical elementary function like sin(), cos(), exp(), the argument must be dimensionless for the function to make sense. but not so for power functions (including roots) or for multiplication and division. when terms are added or substracted they must be of identical dimension.
using the "engineering" form of the dirac delta function (as a limit of "nascent" delta functions, what i'll call a dirac impulse), it is clear that the "height" or dependent variable (the quantity that comes out of the operator after receiving an argument) must be the reciprocal dimension than that of the argument. this is because the integral (or area) of the dirac impulse is the dimensionless unity and if the width is measured by some dimensionful quantity (say, seconds), the height must be measured in the reciprocal dimension.
a simple degenerate example: if we have a filter or LTI system in which the output signal is the same species of animal as the signal going in, the transfer function must be dimensionless. in that case:
${\displaystyle H(s)=\int _{-\infty }^{+\infty }h(t)e^{-st}dt\ }$
the dt is of dimension time, H(s) is dimensionless, est is dimensionless, then that means that h(t) must be of dimension 1/time. indeed for a simple RC low-pass filter (voltage in - voltage out):
${\displaystyle H(s)={\frac {1}{1+RCs}}\ }$
${\displaystyle h(t)={\frac {1}{RC}}e^{-{\frac {t}{RC}}}u(t)}$ where u(t) is the (Heaviside) unit step function.
since RC is time, it is clear that the dimension of h(t) is 1/time.
now, regarding the dirac impulse, what if the filter is instead simply a pair of wires: vout = vin . now what is the impulse response? and then why should the dimension of the impulse response be any different?
LutzL, i've been over this so many times with so many different people. those USENET citings i listed above are but one (but one that was quite public and had a record kept by Deja News which was bought by Google). you can appeal to the distribution definition of the delta function all you want and it does not affect this argument at all, because i will just pick my favorite nascent definition and let a be whatever positive number i want, as small as i want, and in any physical system, the difference of effect from that finite width nascent and any other nascent delta of even smaller width, will be unmeasurable. if the delta function has argument of time, i'll pick a to be a Planck time and, at that point, the argument is over. it's not a true dirac delta "function", but it doesn't matter from the point-of-view a physical system. r b-j 15:24, 21 June 2006 (UTC)
Hell and heaven, yes, you can do what you like, you can invent your own flavor of mathematics/calculus if you want. But don't claim that other people are wrong and should be corrected if they stick to the usual conventions. And please consider that, if I understand that policy correctly, that original homemade research is not wellcome on wikipedia. And you yourself admitted that your point of view is not shared even by the gross majority of the engineering literature. You can go on your own crusade on usenet, you have a sufficiently correct intuition with nascend deltas to be helpful sometimes, but wikipedia is not the place for it.--LutzL 17:12, 21 June 2006 (UTC)
sorry to fluster you (i did recognize that the use of the dirac impulse common in electrical engineering signal-processing or communications texts make mathematicians blanch). but that usage is there in very respected and legitimate texts. this is a dispute of usage between two professional groups: mathematicians and electrical engineers. i fully disagree that Wikipedia is no place for the EE usage of δ(t) or that the EE community has no claim to how the Nyquist–Shannon sampling theorem should be represented. in fact, i think the EE community has the primary claim (but not the only claim) to how the sampling theorem should look. if you want to respond, please take it to the bottom of the talk page since the same dispute of δ(t) usage is also heating up there. r b-j 21:21, 21 June 2006 (UTC)

### Fourier transform exists for Dirac comb?

I can't think of any sense in which this statement can be considered true: "To use that analysis tool, a continuous-time function is contrived conceptually (not actually nor numerically) by using the samples to modulate the 'teeth' of a Dirac comb function, which does have a continuous-time Fourier transform." Can someone explain what is intended here? Even if the 'teeth' are replaced by finite pulses, the comb extending infinitely in time is not square integrable and does not have a Fourier transform.

It means whatever row 25 and Dirac_comb#Fourier_transform mean. --Bob K 22:29, 21 June 2006 (UTC)

Or was the "which does have..." meant to refer to the modulated comb, under the assumption that the orignal signal was square integrable and had a Foureir transform, and the teeth were nascent delta functions? Too bad if so, since it means the analysis no longer applies to those functions, such as outputs of stationary random processes, that are NOT square integrable but DO have a well defined spectral density.

This section needs to be rewritten, or abandoned, I think. Dicklyon 17:36, 21 June 2006 (UTC)

Dick or LutzL, does (non-zero) DC have a fourier transform? is it square-integrable? if you say it does not have a F.T., do the standard engineering communications texts agree? r b-j 17:52, 21 June 2006 (UTC)
No, a constant function does not have a Fourier transform, unless you extend the concept to allow delta-function-type singularities. It's not an unreasonable thing to do, and Khinchin did it, if I recall correctly, but it's not really a Fourier transform. It does have a spectral density, however, which is the Fourier transform of its autocorrelation function (defined in terms of expected value of a product, which is finite everywhere). Most texts agree when they are at all rigorous, but some may not be. See Wiener-Khinchin theorem. Dicklyon 18:24, 21 June 2006 (UTC)
There is this thing called tempered distributions, which is the dual space to the Schwartz space of test functions. As tempered distributions, polynomials and polynomially bounded, locally integrable functions have Fourier transforms. The Fourier transform of a polynomial is a sum of derivatives of the delta distribution at the point zero. Tempered distributions are a common theme of functional analysis. In this regard, electrical engineering is applied functional analysis.--LutzL 14:59, 22 June 2006 (UTC)
well, not every EE curriculum oriented signal processing book can be Papoulis, "Signal Processing". if polling is your method of choice (in deciding what Wikipedia should do regarding this article), i wonder how many EE signal processing and communications texts will say somewhere in the book:
${\displaystyle {\mathcal {F}}\left\{x(t)=c\right\}=X(f)=c\delta (f)\ }$
${\displaystyle {\mathcal {F}}\left\{s(t)=\sum _{n=-\infty }^{+\infty }\delta (t-nT)\right\}=S(f)={\frac {1}{T}}\sum _{k=-\infty }^{+\infty }\delta (f-k/T)\ }$
or their equivalent with angular frequency. ??
are you saying that more of your above qualified books don't say that than those that do? how 'bout if we add "undergraduate level" to that list of qualifications?
Why is it that all these authors get away with saying it? r b-j 20:56, 21 June 2006 (UTC)
OK, I think I now see the sense in which the Fourier transform of a Dirac comb can be said to exist: it leads to a treatment in terms of Dirac functions that "works OK", so in that sense it exists, and is often found in books, even in Fourier transform pair tables. I'd still feel a lot better to have an explanation, derivation, and proof that doesn't require that artifice. On other hand, what most do, like Shannon did, is to implicitly assume the signal to be sampled is square integrable, and therefore not address the applicability of the sampling theorem to stationary random processes. Can we fix both problems? Has someone done so rigorously? I'll be looking for it.
I just found where the key inconsistency in the wikipedia is: the Fourier transform article section [11] says "the unqualified term 'Fourier transform' refers to the continuous Fourier transform, representing any square-integrable function...", but the distribution (mathematics) article [12] says "all tempered distributions have a Fourier transform, but not all distributions have one." So is the Dirac comb a "tempered distribution"? I see it's a "Schwarz distribution", but that term is nowhere defined.
Dicklyon 04:03, 22 June 2006 (UTC)
It's meant to be the same. Tempered distribution is the more common term. And yes, it is a tempered distribution as any text on harmonic or functional analysis will tell. Even using the Dirac comb, one can only prove the sampling theorem for square integrable functions. Of course there is the Nyquist sampling theorem for periodic functions, which is a special version of polynomial interpolation: a polynomial of degree N is determined by N+1 values at different points, a trigonometric polynomial with at most N voices is related to a polynomial of degree 2N, evaluated on the unit circle, and thus determined by 2N+1 samples at different phases of the period.--LutzL 14:59, 22 June 2006 (UTC)

i think the proof based on Poisson summation formula (PSF) does not use dirac impulses at all. a couple are listed above. we can argue about whatever concise version of the proof is clearer. r b-j 05:02, 22 June 2006 (UTC)
The PSF is a generalization of the Dirac comb. Or the Dirac comb with its Fourier transform, the Dirac comb, is an interpretation of the PSF in terms of tempered distributions. While the dirac comb is only applicable to Schwartz test functions, the PSF is applicable to fast falling (4. degree) twice differentiable functions. To prove the theorem, convolution of the Dirac comb with a fast falling locally integrable function is the shortest way to go using distributions. Just splitting up the inverse Fourier integral into a sum of integrals over periods is the fastest way to go without using distributions.--LutzL 14:59, 22 June 2006 (UTC)
also, i think stationary random processes that are the sum of equally-spaced sinc functions (like normal reconstruction) where the coefficients of the sinc functions are random numbers coming out of some ideal random number generator, i think that stationary process can be sampled. i would think that some IEEE or Bell-system Journal would have someone describing this. i dunno. r b-j 05:12, 22 June 2006 (UTC)
Yes, of course random processes can be sampled. Their Fourier transforms need not exist. And I think they can be unambiguously reconstructed, too, if they are bandlimited. I thought this was well known and proven, but I see that Shannon's original proof seems to only apply to square integrable functions. The only real issue is the convergence of the infinite sum of sincs, which as someone pointed out has the potential to diverge due to the 1/x nature; I think that for any frequency strictly less than half the sample rate, however, convergence can be shown. Dicklyon 05:42, 22 June 2006 (UTC)
The convergence condition for sinc-series is given in the reconstruction formula article. An infinite series of random events violates with probability one this condition. Thus there is no function to speak of, even not in a generalized sense. One can't take samples from a non-functions. - From a non-function, one cannot determine the Fourier transform, there is no highest frequency to speak of.--LutzL 14:59, 22 June 2006 (UTC)
i don't think any random process can be sampled and accurately reconstructed. About the 1/x like convergence of the sinc, we already know we have crossed a line with the strict mathematicians when we engineers treat the dirac delta precisely like a function that violates one of the properties of Lebesgue integration: If f, g are functions such that f = g almost everywhere, then f is integrable if and only if g is integrable and the integrals of f and g are the same. If we're past that, then i think we can use the invertibe or one-to-one mapping of the Fourier Transform to support the point (indirectly) of what the sum of sincs must convert to if it converts to anything. they criticized Fourier for lacking proof of convergence, too. even though the 1/xenvelope is insufficient to guarantee convergence (but it doesn't precude it, we know that ((-1)n)/n converges) of the reconstructed value in between samples, i am comfortable with the lack of direct proof. r b-j 06:06, 22 June 2006 (UTC)
You are again building up strawmen and beating them to death. Of course, the Fourier transform is also one to one on the space of tempered distributions. And the periodization of a tempered distribution with compact support is again a tempered distribution. This allows to write things like
${\displaystyle \sum _{n\in \mathbb {Z} }\delta '_{n}=i2\pi \sum _{k\in \mathbb {Z} }k{\hat {\delta }}_{k}}$
where ${\displaystyle \delta _{n}[x]=x(n),\;\delta '_{n}[x]=x'(n)\;and\;{\hat {\delta }}_{k}[x]=X(k)}$ for any Schwartz test function x=x(t). For any partial sum of the right hand side one can give a regular distribution, that is a functional consisting of integration over a product with a normal function. In the limit, this does not hold. It would serve you well, rbj, if you would look up the details of this dispute over the convergence of the Fourier series. Carl Offners script "A little harmonic analysis" is a good starting point, if read back to front. To solve the problems of this dispute it needed several revolutions in mathematics from the mid-19th to the first half of the 20th century, the most severe revolution was Cantors set theory and the definition of a function as a relation resp. a subset of the set of pairs.--LutzL 14:59, 22 June 2006 (UTC)
Consider the reading of a thermometer, measuring the temperature of the room you are sitting in. It is a bandlimited random process, which we could sample ${\displaystyle 10^{6}\,}$ times per second, if we wished. I realize, of course, that the samples must have finite precision and that we cannot reconstruct the future. But the past is not random. It is a specific instance of a process that we can only describe probabilistically, until it happens. That specific instance can of course be reconstructed, like any other B/L function. [OK. I'm in my flame-resistant suit... flame away!] --Bob K 12:53, 22 June 2006 (UTC)
Yes, and you need a sampling modell and a sampling theory for this specific instance of sampling that is only marginally related to this, Shannons, sampling theorem. And I doubt that you can reconstruct the process to any precision, since that would require to be able to reconstruct every single collision of an air molecule with the thermometer. And since the collisions lead to jumps in the inner energy of the thermometer, this process is bandlimited only in a generalized sense, where the Fourier transform is bounded below 3 dB (of what?) outside the condidered frequency band.--LutzL 14:59, 22 June 2006 (UTC)
I'm not talking about the instantaneous temperature at a molecular level. I am talking about the filtered temperature, time-averaged by the lowpass response of the measurement device. That is a random process too. And I already acknowledged that realistic sampling cannot represent it to infinite precision (quantization noise). But that has nothing to do with whether the underlying process is random or not. If that's your best shot, I probably won't be needing the flame-suit afterall.   :-)   --Bob K 15:24, 22 June 2006 (UTC)

## Tune ups

I hope my "tune ups" and such meet with approval. I found lots of things unclear or not quite right. The main part that now remains bugging me is the "Mathematical basis..." section that starts out claiming that "To prove this, a mathematical representation of the uniform sampled signal that effectively discards the information between samples must be constructed." Is this really so? Aren't there other ways to prove it? Can we at least agree that this is just an approach, not something that MUST be done? I can't begin to get my head around what is right and what is unclear until I clear this up.

I hope I'm not being too picky or rigorous; I think it's worth converging on something that's really right and consistent.

Dicklyon 05:37, 22 June 2006 (UTC)

## Frequencies versus sinusoidal components

Rbj, I used "frequencies" just like Shannon did (see box above). This means only and exactly what the math says (the Fourier transform being zero). To say "sinusoid components" muddies the waters, because a component is usually understood to be discrete in frequency; a pure tone component. Any square integrable signal, i.e. any signal that can actually be Fourier transformed with delta functions, has NO sinusoidal components. This is, if you look at the energy in any band, in the limit as the bandwidth goes to zero (toward sinusoidal), that energy is zero in the limit. So "no frequencies above..." is a much stronger condition than "no sinusoid components above...".

Fix it back? Dicklyon 05:50, 22 June 2006 (UTC)

i did it to clear the waters, but do with it what you want. i like to think that my body contains matter that has the property of warmth rather than my body contains warmth but i might just be anal. r b-j 06:10, 22 June 2006 (UTC)

## Random processes

New section because the complexity of topics got out of hand above. Here's the issue: is there a sampling theorem for (samples functions of) random processes, which are not in L2?

LutzL says above: "From a non-function, one cannot determine the Fourier transform, there is no highest frequency to speak of." But a sample function of a random process is not a "non function", and the random process itself (if stationary in a wide sense) can have a spectral density. So what's the problem?

He also points out that the reconstruction formula does not converge; with probability 1, the conditions for convergernce are not met. Does that really mean it does not converge? Or just that we can't prove some technicality?

So two questions, really: Is there no mathematical sample theoreom provable for wide-sense stationary random processes? Is there any problem in an engineering sense with using the Shannon sampling theorem with such processes?

Dicklyon 15:28, 22 June 2006 (UTC)

So, I looked at what was said about convergence at Nyquist–Shannon interpolation formula, thought about it, and wrote up what I believe to be true for random processes. Which is that it sounds like a non-problem and that convergence is practically guaranteed. Any comments? Can I copy some of that to here, or is someone more mathematical than I going to quibble with it? Dicklyon 00:35, 23 June 2006 (UTC)

Who, us?... quibble? --Bob K 04:01, 23 June 2006 (UTC)
I have a question. WSS processes are all fine and good for some things, but they have no beginning and no end. And isn't that what leads to their exclusion from L2? I.e., what about a windowed WSS process (which of course is no longer WSS)? In the real world it is always windowed processes that we are dealing with. I have done a lot of DFTs on them, and it seems to work just fine. So what is all the fuss about? --Bob K 04:01, 23 June 2006 (UTC)

Yes, you've got it right. The fuss is just that some kinds of analytic statistical techniques work best with WSS processes, rather than with finite signals. So it's handy to have some applicable theorems, such as the Wiener–Khinchin theorem to help out.

I scanned some books, and found what I was looking for in Wozencraft and Jacobs, Principles of Communication Engineering, 1965, pp.598–603, after not finding it in Gallager nor Sage and Melsa. They prove the sampling theorem in an appendix, but in the main text explicity discuss extending it to random processes, so they can analyze random waveform sources. They conclude that the process doesn't even have to be stationary to be bandlimited and subject to the sampling theorem, but I'm not sure how that result can be formalized. However, they also admit that "The assumption that a process is ideally bandlimited is not completely realistic."

Their page 2 also has a nice historical summary of Nyquist's and Hartley's contributions to information theory. I think I'll say something more in this article about what Nyquist actually showed.

Dicklyon 04:55, 23 June 2006 (UTC)

"They conclude that the process doesn't even have to be stationary to be bandlimited and subject to the sampling theorem..." i didn't think that stationary was necessary or really an issue (as long as the changing parameters or moments of the process didn't make the bandwidth of the random process equal or exceed Nyquist). i still think that given virtually any discrete random process, x[n], that is these numbers come out of some kind of random number generator, even the kind that samples diode noise with an ADC or a numerical pseudo-random number generator that
${\displaystyle x(t)=\sum _{n=-\infty }^{+\infty }x[n]\mathrm {sinc} \left({\frac {t-nT}{T}}\right)\ }$
is both necessary and sufficient for sampling at virtually fs = 1/T (you might have to be a hair higher). if x[n] is stationary then x(t) is. now not all possible sequences of x[n] are legit (but i think they're also bloody unlikely). if such a sequence came out of an RNG:
${\displaystyle x[n]={\begin{cases}(-1)^{n},&{\mbox{if }}n\geq 0\\-(-1)^{n},&{\mbox{if }}n<0\end{cases}}}$
it could not be reconstructed so i am not sure how one could speak of it being sampled. but it's also not bloody likely to come out of any RNG. r b-j 20:13, 23 June 2006 (UTC)
As long as the probability distribution of the random generator has a positive deviation (2. moment), then the square sum of almost any infinite sequence of random samples will have infinity as square sum. The same goes for the weaker criterium. What is legal and possible is to exchange the sinc function for a compactly supported function. One can find such functions that are almost bandlimited, or bandlimited in the 3dB sense. The corresponding series will still not converge in L²(R), but it converges locally to a continuous function, since the infinite sum has at each point only a finite number of nonzero terms. One can apply Fourier analysis to windowed portions of this function, make a generalized approximative sampling theory about it,...--LutzL 08:51, 24 June 2006 (UTC)
I disagree with what you seem to be implying from your opening sentence: "As long as the probability distribution of the random generator has a positive deviation (2. moment), then the square sum of almost any infinite sequence of random samples will have infinity as square sum." True, but that's not the issue. You need to look at those terms weighted by the sinc. If the infinite sum of the squares of the infinite weight sequence (the since) converge (which they do, since their magnitudes are bounded by the square of 1/n), then the infinite sum of the random variables will have a finite variance, and the variance of the terms truncated in the tail will converge to zero. Or so it seems to me. What's wrong with my reasoning? Dicklyon 14:51, 24 June 2006 (UTC)
Yes, there will always be pathological sequences for which the reconstruction does not converge. That's why you need a characterization of the process, so you can conclude that those have probability 0.
The reason you need stationary, or so I thought, was so that the notion of bandlimited could be defined. What do you use as a spectrum characterization for a non-stationary process? While it may be true that any random sequence of samples can be converted back to a provably bandlimited signal, that doesn't help you make the theorem unless you can show some class of signals to be bandlimited to start with. Dicklyon 22:03, 23 June 2006 (UTC)
${\displaystyle x(t)=\sum _{n=-\infty }^{+\infty }x[n]\mathrm {sinc} \left({\frac {t-nT}{T}}\right)\ }$
is not guaranteed to be bandlimited? r b-j 02:06, 24 June 2006 (UTC)
yes, it is. But what class of original signals does that correspond to if you don't have stationarity to have a theorem about frequency spectrum content? 24.6.152.39 05:33, 24 June 2006 (UTC)

## Aliasing

at an earlier time, i remember bringing up the subject with BobK. i think we should just be clear that aliased frequency components (of frequency f) are those that could be mistaken for the legitimately sampled component at mfs - f where m =floor(f/fs +1/2) - 1/2. it is a distortion but a very specific kind where some frequencies masquerade for others. r b-j 20:13, 23 June 2006 (UTC)

Well, I liked how they put it: "the resulting distortion is called aliasing". This distortion is very simple only in cases where the signal consists of sinusoid components. In general, a spectral inversion via aliasing is pretty messy, though conceptually simple in terms of what it does to frequency components as you point out. That's why moire patterns and jaggies in images can look so weird. Distortion is a good all purpose term for the difference between the ideal or original and what your system actually does. Dicklyon 22:07, 23 June 2006 (UTC)

## The Introduction section

Just made some modifications for the following reasons:

• A band-limited signal can changes arbitrarily fast, just use a sufficiently high amplitude since the rate of change is the product of amplitude and frequency.
• The Fourier transform used here is not unitary, and there is no requirement that this particular version of the FT is used to prove the point.
• I have converted to the notation ${\displaystyle {\mathcal {F}}\{x\}}$ for the Fourier transform of the signal x, as this is the notation used in the Fourier transform article.

--KYN 19:22, 2 August 2006 (UTC)

And after your changes got reverted for unknown reasons, I changed it differently. In particular, there's no need to introduce Fourier transform at all in stating or understanding the theorem or the concept of bandlimited. The existance of the FT is a big stumbling block that limits the domain of applicability of the theorem, so let's do without it. Dicklyon 02:04, 3 August 2006 (UTC)
And then I changed the symbol for bandwidth to B, including in the image of the hypothetical spectrum. Dicklyon 02:24, 3 August 2006 (UTC)
although nearly anything is possible, it is pedagogically ridiculous to try to teach anyone about the sampling theorem without the concept of frequency spectrum. both the straight-forward proof ("let's sample the SOB with a dirac comb and see what happens") and the more indirect proof of the Poisson summation formula require the use of the continuous Fourier transform (the latter doesn't use a dirac comb). there are different notations and even philophies between electrical engineers (and the scholars among them) and mathematicians. this very talk page shows evidence about that (regarding the nature of the dirac delta function. this article should not be relegated to the impenitrable language of the pure mathematicians because i can guarantee that 99% of those who will profit from it will only from the simpler and more straight-forward EE perspective of it. r b-j 16:30, 3 August 2006 (UTC)
The concept of a frequency spectrum is a good one, but it is very easy to use it incorrectly while pretending to be rigorous. I'm no mathematician, however, so have no fear about me adding that sort of impenetrable language. I'm on your side in trying to avoid it. As to pedagogically ridiculous, however, I think that goes too far. It may be possible to explain the sampling theorem quite precisely, without proof, using the concept of frequency without the concept of spectrum. For example, Harold S. Black's discussing of sampling, including many correlary theorems, is very clear without mentioning the concept of a spectrum. But it's without proof. Dicklyon 17:59, 3 August 2006 (UTC)
My opinion is that we should not allow stationary processes to obfuscate the introduction in any way. They can be dealt with later or elsewhere, which is how the subject is normally taught. Nobody samples or reconstructs an infinitely long signal anyway, so as a practical matter, the existence of the FT is irrelevant. What we really do is sample a windowed signal, which is no longer stationary, and its FT does exist. But we don't like to talk about that either, because a windowed signal is not strictly bandlimited, so aliasing is inevitable. But that is the reality of the situation. That is what most people need to know. --Bob K 13:21, 10 August 2006 (UTC)

## Mathematical basis

What's up with this long mathy section? Is it really required to go into all this gory math with Dirac distributions to understand or prove the sampling theorem? Can we do it more concisely? Has anyone actually read the whole thing and can verify that it's even right? Dicklyon 02:24, 3 August 2006 (UTC)

for someone who says you don't need the FT, that's an odd statement. what do you want to use? Poisson summation formula? (that proof is on this page.) r b-j 05:00, 3 August 2006 (UTC)
I'm not sure which statement you think is odd; I wrote nothing but questions. The FT is not needed to state or understand the sampling theorem. It may well be needed to prove it. Do you know for sure? Does the Poisson technique avoid FT or Dirac? Dicklyon 05:06, 3 August 2006 (UTC)
there were (presumptive) statements imbedded in the questions that should be supported. as for your last questions in each of your two posts here, can you answer that yourself? (i should think you should be able to if you're editing this article.) you might also check out some of the back and forth between User:LutzL and myself about this issue. r b-j 05:15, 3 August 2006 (UTC)
R, OK, but I'm still not sure what presumptive statements you find odd. My point is that the mathy section is extremely long-winded, and something that long and mathy is not going to help anyone understand the subject. I'm perfectly capable of verifying and understanding every line of it, but it seems to me that there's got to be a better, or at least more concise way to do this. I'll try to find one. Or two. Do we really need to use different math for the different cases of finite-energy signals versus stationary processes? By taking the FT out of the intro, I believe I used math and language that is simple and correct for both. Do you agree? Dicklyon 17:48, 3 August 2006 (UTC)
Talk:Nyquist–Shannon_sampling_theorem#Proof_using_PSF above is an alternative proof using the PSF that is about as consise as it can get. but it is not as straight-forward to understand as the present "long winded" proof. and it uses the FT (but does not use the Dirac Comb). i personally do not see how it can be proved without the FT. some proof that is well accessable to undergrad EE students (and others with a similar level of mathematical sophistication) belongs in the article. User:BobK put in this "concise overview" which i am not as certain is useful, but i didn't want to fight that fight.
i will fight any fight that tries to remove all proofs of the the theorem that are rigorous enough to not have big holes in it. and i will fight any fight (with the pure mathematics crowd) to obfuscate this with objections to the use of the "nascent" dirac delta function that engineers and engineering texts commonly use (EE's treat the dirac less rigorously - like a "regular" function - whereas mathematically it is not strictly a function like that). that is the basis of my dispute with User:LutzL above (and you can see there were other objections from other EE-background editors). i am willing to let the pure math guys take over the content of the continuous Fourier transform and some others, but not this article. r b-j 18:15, 3 August 2006 (UTC)
also, i do basically disagree with the removal of FT references. the most fundamental facts of the sampling theorem that should be put in the introduction is that: if the continuous-time signal is bandlimited (how do we define that without the use of FT or the concept of spectrum?) and the sampling rate exceeds twice the bandlimit, that there is sufficient information in those discrete samples to reconstruct the original. i believe that you are crapping up the article and making the concept less accessable by removing FT from it. r b-j 18:20, 3 August 2006 (UTC)
I disagree with removal of FT references, because it is typically not taught that way in schools. Typically, stationary processes are not dealt with right away (or at all). For people who actually sample and reconstruct signals, it is completely irrelevant, because nobody reconstructs infinitely long signals. (I am repeating myself, because the discussion has become fragmented.) A windowed stationary process is non-stationary, and its FT exists, so we can use it. The real problem for signal reconstruction is aliasing. I.e., as we all know, there is no such thing as perfect recontruction, because it requires both a bandlimited signal and sampling forever (so it never happens). --Bob K 13:21, 10 August 2006 (UTC)
R, you sound like you're trying to pick a fight. Nobody's trying to remove proof or banish FT. My point is in answer to your parenthetical question "(how do we define that without the use of FT or the concept of spectrum?)". Did you see how I wrote it? It looks just like an FT, but it doesn't require the FT to exist to be meaningful in the stated region. Is that too close to pure math? Dicklyon 18:27, 3 August 2006 (UTC)

I re-did this section a bit. Just tightened up a few things, fixed some informalities, made it more clear what's true and what's required, I hope, and just a bit shorter. I still think it can be a lot more concise, especially the first part that gets about 10 equations in before showing a sequence of dirac impulses weighted by sample values. I may work on that later. Dicklyon 08:41, 4 August 2006 (UTC)

Hi. Dicklyon, if you say that you understand every line of the mathematical basis article and find it correct then you don't understand enough basic calculus. See the discussions above on the Fourier series of a Dirac comb. The formula given is only correct in a very weak sense that would require a lengthy comment or at least a link to the Dirichlet kernel. As written it claims the identity of a tempered distribution to a mathematical impossibility. In the introduction, to avoid the Fourier transform, you give a Fourier integral as the more general transform. Unfortunately, the converse is true. There is a Fourier transform for functions, for which the Fourier integral does not exist. A commonly known example is the sinc function. And to end my comment a general remark: the sampling theorem is also a fundamental theorem in the mathematical discipline of harmonic analysis, it was known and heavily used there even before communication engineers discovered the importance of the concept of the bandwidth of a signal (around 1920).--LutzL 10:08, 4 August 2006 (UTC)
I never said it was correct. I understand your point exactly, which is that the concept of delta functions does not integrate trivially with transforms and other mathametical operations; yet it is conventional (in engineering at least) to use them as if they do. I was just trying to make the writeup consistent and a little more concise within this framework, e.g. by adding the condition that x be square integrable so that at least its baseband tranform exists.
As to your remark about the history, yes, we've tried to capture that in the article already. Please add to it if we missed some critical points. Dicklyon 17:20, 4 August 2006 (UTC)
Dicklyon at least has a point in that a formally correct proof appears rather lengthy. Is it really necessary to present a formally correct proof in the article? Can the theorem be stated without a formally correct proof? I believe that it is sufficient to let the article include the standard informal proof in the form of a graphical illustration of what happens in the signal and frequency domains when the signal is sampled and then reconstructed, with and without aliasing. Examples of such informal proofs can be found in most textbook on the subject. Apparently, the theorem can be proven in different ways, why not consider the option of writing separate articles for these proofs? --KYN 11:03, 4 August 2006 (UTC)
Yes, given the turbulent history of this article. A lot of time has been wasted fighting over proofs, as if there is only one "right" way to do it. Now the article doesn't even have Shannon's original proof, which requires neither the controversial "distributions" nor Dirac delta functions. It just requires a simple Fourier series expansion. For proof lovers, there should be a [different] place for anyone's and everyone's favorites. --Bob K 02:44, 9 August 2006 (UTC)
Bob, I think you should pull that from an old version and put it back in a section of its own on "Shannon's proof". But if I recall correctly, he did need to use a representation of the complex spectrum, which means his proof probably is not applicable to stationary processes. I still haven't found one that works for that. Dicklyon 02:54, 9 August 2006 (UTC)
Don't misinterpret my point. I did not say it needs to be lengthy to be formally correct. I'm not sure that's true. I also don't think it needs to be so lengthy to be "engineering correct". I just haven't had time to make it much shorter. I've tried hard already to make the statement of the theorem concise and understandable, but with so many contributors adding what they think helps, it is likely to be an ongoing process. The statement I added about the integral being zero, as a way to define bandlimited without introducing two types of signals and transforms, is one such attempt, and I'm still waiting for a mathematician to tell me why it's really not quite correct; we'll see. The idea of a separate article for detailed proofs and discussions of the engineer/math perspectives is something I've considered. Anyone else like that? Dicklyon 17:20, 4 August 2006 (UTC)

## Introduction section again

The Introduction section still contains some statements which I get stuck on:

1. A signal that is bandlimited is constrained in terms of how rapidly it changes in time ... I take it that the author means that "rapidly" refers to frequency. However, I believe that most readers think of "rapidly" in terms of rate of change per time unit, that is, derivative, and this is not bounded for a bandlimited signal. Also, I don't see that the point of the sentence is lost by simply writing, shorter and less confusing:

A signal that is bandlimited is constrained in terms of much detail it can convey in between discrete instants of time.

It doesn't matter to me which way this informal statement is expressed. If you interpret "how rapidly it changes in time" relative to the sample values nearby, then reading it as a derivative limit is about right. I'm not sure the idea of "detail" is any more correct. Dicklyon 17:27, 4 August 2006 (UTC)
OK, this may be a problem related to how I parse the entire sentence. I am guessing now that "and therefore how much detail it can convey" is a subordinate clause. If yes, it makes sense but I would prefer to explicitly mark it as such, with commas, to make my parser put thing together in the right order. If not, I am still lost. --KYN 19:08, 5 August 2006 (UTC)

2. DickLyon has introduced the idea that the sampling theorem does not have to be restricted to only signals for which the Fourier transform is well-defined. I have no idea if this is correct or not, but the statement is confusing from the point of view that the theorem is based on bandlimited signals. Is it even possible to talk about a bandlimited signal which does not have Fourier transform? Even if it is, should such generalization be considered already in the introction?

--KYN 11:25, 4 August 2006 (UTC)

In my experience, engineers in information and communication theory deal mostly with statistical signals; "stationary" signals have a well-defined concept of power spectral density and band limit, but do not have Fourier transforms. Yet the sampling theorem, including reconstruction formula, certainly does apply to them. I just want to make sure they don't get left out, but they need not be mentioned unless the way a particular passage is written would otherwise exclude them by requiring the existence of the Fourier transform. Dicklyon 17:27, 4 August 2006 (UTC)
I am one of those engineers. As I have said before, we only deal with time-limited waveforms (e.g., a windowed stationary process), which are necessarily neither stationary nor strictly bandlimited. Aliasing can be managed but not eliminated, so perfect reconstruction is theoretically impossible. The way I would teach the sampling theorem is:
1. Teach the Fourier transform.
2. Teach the discrete-time Fourier transform. I.e. show how sampling changes the Fourier transform. Specifically:
${\displaystyle X_{\mathrm {T} }(f)=\sum _{k=-\infty }^{\infty }X(f-{kf_{s}})}$
3. Point out that aliasing is inevitable, except for a strictly bandlimited signal (which implies infinite duration, so what good is it?).
• Mention that there is a famous theorem that treats the strictly bandlimited case anyway, and it derives a formula for perfect reconstruction.
• Also mention that nobody actually uses that formula, because it requires infinite time.
--Bob K 13:55, 10 August 2006 (UTC)
Bob, that's a perfectly sensible engineering approach. Where I learned engineering, however, we tended to also study the serious math behind the serious theorems, more like the treatments that Shannon and Wiener liked to use. The most common concept for statistical signal processing was the stationary process, even though in practice you would do computation only on finite parts of sample functions from such processes. Both approaches have value, and both should be represented here, but they shouldn't get confused with each other. Dicklyon 16:02, 10 August 2006 (UTC)
I was not describing how I learned it. I believe it was introduced (to me) in terms of just bandlimited functions with nice Fourier transforms. Random processes came later, and I don't recall the issue of sampling and reconstruction being revisited. You probably learned it the same way. But after almost 40 years of doing this stuff, and particularly after witnessing the rampant misconceptions amongst Wikipedians, my conclusion is that there is a better way. The three steps above do not preclude a fourth step, which would extend the concept to non-windowed stochastic processes, if you are inclined and able to do that. But since it is already too late to start sampling such a process, there is no need to obfuscate the first three steps on its behalf.   --Bob K 20:41, 10 August 2006 (UTC)
OK, here's an idea. How about an article on imperfect sampling/reconstruction for not-quite-bandlimited functions. Bringing those concepts in here, where the theorem is about something different, is why we have so much trouble settling on an approach, I think. Make an article such as Sampling and reconstruction, crosslinked here, and develop the theory of windowing stationary processes for spectral estimation, time/frequency uncertainty, quantification of aliasing, etc.? Dicklyon 21:02, 10 August 2006 (UTC)
Yes. Divide and conquer. Make each battle (article) small enough to win, but large enough to matter. If you have the time and inclination, go for it. Otherwise I will get to it eventually, but probably not for at least a week. --Bob K 22:58, 10 August 2006 (UTC)
Can you provide an example of a stationary signal which is bandlimited but does not have a Fourier transform. --KYN 19:48, 4 August 2006 (UTC)
Yes, since ALL stationary processes don't have Fourier transforms (unless you count the identically zero process), and SOME of these are bandlimited. A stationary process has a power that doesn't depend on when you observe it, so has infinite energy. A Fourier transform requires that the signal to be transformed have finite energy (can be extended with tricks to work for infinite-energy signals like dirac functions and combs, but that will not apply to typical random processes). A specific example: the first-order autoregressive process with autocorrelation function equal to exp(|tau|), which has unit power and a spectrum that you'd get by putting white noise through a simple first-order RC filter with R*C = tau. If you take a sample function from this process and try to do an FT on it, the expected variance of the transformed value will be infinite at every frequency. But that one's not bandlimited. To get one that's bandlimited, you need an autocorrelation function whose Fourier transform is zero above the frequency B. So take such a spectral density, say a rect function, inverse transform it to get the autocorrelation function, and find a filter such that that is then the convolution of the impulse response of the filter with itself. The easiest such filter is the sinc filter. I.e. put white noise through a sinc filter and you have a bandlimited stationary process with no Fourier transform. In practice, just as with finite-energy signals, getting a signal to be exactly bandlimited is a mathematical near impossibililty; this just means that the sampling theorem seldom holds exactly. Dicklyon 20:13, 4 August 2006 (UTC)
I am still not certain that when you say "don't have Fourier transforms" you mean that the transforms do not exist as proper functions but can be formally defined in terms of distributions, or if you mean that they cannot be defined at all.
No, they don't exist at all, in the sense that they are unbounded at essentially all frequencies, not just a discrete set of dirac functions. Signals that are stationary have infinite energy at any frequency where they have any power. Dicklyon 20:27, 5 August 2006 (UTC)
In either case, I don't believe that it is necessary, already in the introduction, to introduce the reader to some of the generalizations which are possible. It is sufficient to say that x is a signal (which we have to assume that the reader has some familiarity with) but not to specify that it can be complex-valued (which is correct, but why not vector valued also?) or that it can be a stochastic process instead of a just a realization of such a process. I would prefer to see a section later on which lists these and other generalizations which are possible for the theorem, once all the important concepts are in place. --KYN 19:08, 5 August 2006 (UTC)
Yes, good point. We shouldn't need to mention complex- or vector-valued alteratives until we need to say real-valued, which should not be in the intro. They should be an aside later when the domain of the proof is specified. Dicklyon 20:27, 5 August 2006 (UTC)

Some further points on the Introduction section:

3. I see that there are overlaps between sections "Introduction" and "The sampling process": statements about relations between sampling frequency and bandwidth occur in both as does the relation between sampling interval and sampling frequency. Also, "Introduction" makes use of the concept "alias-free sampling" without explaining what it means, while it is reasonably intuitively described in "The sampling process". Here is my point:

Can we delete the "Introduction" section entirely, possibly after moving a few non-redundant parts of it to "The sampling process" section? The latter section was written with the intent of being an introduction, saying "Hi and Welcome" to anyone who has never heard of the sampling theorem and wants to know what the fuss is about. I believe that it can serve as a good introduction, providing all the necessary pieces and statements with a minimum of math.

All the juicy math and other technicalities which appear to be so interesting to all you good people who are editing the article can then be developed at depth in the following sections.

I am not averse to combining those first two sections into one, but it needs to be done carefully so avoid making a long section with too much extraneous info. I think I agree that aliasing is an extraneous concept when first stating the sampling theorem precisely, since it's an effect that happens only in situations outside the conditions of the theorem; so leave it for later. Instead of alias-free sampling, talk about sampling with perfect reconstruction. Dicklyon 21:53, 5 August 2006 (UTC)
I think even better is to talk about sampling and what it does to the frequency spectrum (i.e. the DTFT) in the general case. Then behold!... look what happens to the DTFT if the original spectrum is bandlimited and the sample-rate is greater than twice the highest frequency... the original spectrum is preserved intact! Could it be that the original signal is recoverable? --Bob K 03:00, 9 August 2006 (UTC)

## Mathematical basis changes

As I mentioned above, I made some changes to tighten up the text of the mathematical basis section a bit; that is, to make it a bit more concise, a bit more clear, and a bit more correct. Please comment if you see any changes that make it less clear or less correct.

Rbj has reverted all the changes, saying they don't help, and I've re-reverted them back. On looking at the history, I see that it was Rbj around April 30 who made this section so long and verbose, so he has perhaps too much personal attachment to the details. Personally, I find the much shorter versions before April 30 to be more appropriate to the article, in terms of length and level of coverage. Perhaps we should go back to one of those versions and tune it up, instead of trying to work back down from the verbose version?

Dicklyon 18:17, 4 August 2006 (UTC)

you've actually made it more verbose, except where you deleted real information that BobK put in, in lieu of that Note about scaling section you didn't like. we worked out a change that dealt with his concern and mine and you deleted that whole thing. it's like you're driving a huge dumptruck down the middle of the road completely unawares of other people are doing to make this article exactly what you think it should be without doing the politics necessary to get concensus. you need to read Wikipedia:Why stable versions. what you are doing is "edit creep" which moves this article to something approximating a good article to something unrecognizable. r b-j 04:57, 5 August 2006 (UTC)
Would you be so kind as to point out the edits you are referring to? Dicklyon 05:01, 5 August 2006 (UTC)
Anybody? Opinions have been solicited here, and nobody will engage, not even the guy who calls me a dumptruck. Dicklyon 05:28, 5 August 2006 (UTC)
Well now Rbj has reported me for violating the three reversions rule, so maybe I'll have to let him drive it off in a new direction for a while. Still seeking opinions on whether the long mathy section can't be made more clear, correct, and even eventually concise. Dicklyon 05:57, 5 August 2006 (UTC)
Since nobody's talking at this hour, let me review the compromise proposal I made a while back, splitting the difference with two of my four sequential edits. Here's the latest diff relative to where Rbj left the page: [13]. Note that the final paragraph has been deleted, because it had little or nothing to do with the "mathematical basis" section, and the other edits are all minor, clarifications, grammar, etc. For example, using /fs instead of *T in a subsection that had no locally apparent T. And I pointed out that it all works for complex-valued signals, too, since it's all written in complex math that applies equally there (if I'm not right on this, please say so). Anyone think this corresponds to driving a dumptruck over the work of others? Please comment. Dicklyon 06:23, 5 August 2006 (UTC)
Seeing no objection, I will proceed to resurrect the next step of my changes. Please comment. There's one more after that. Dicklyon 18:17, 5 August 2006 (UTC)
Now I've done the last of the four that Rbj didn't like. I'm still waiting for any reactions, one way or the other, to the content of these changes. This one is about conditions on the H(f) reconstruction filter, and that leads to the constraint that relates sample rate to bandwidth. The way it was, the notion of "overlap" was used, without an explanation of why it might matter; and it concluded or stated "the reconstruction filter H(f) must be:..." followed by a function that is not actually required. By stating the actual constraint on what H(f) needs to do, we get a looser constraint on the form of H(f), and a logical way to conclude what relationship must be satisfied about the sample rate and bandwidth, without appeal to the notion of "overlap". The next step should be to prune the long-winded start of this section, and maybe appeal to an external derivation of the sinc result, as was suggest by others above. Dicklyon 21:42, 5 August 2006 (UTC)
OK, one more major shortening by not belaboring the derivation of the reconstruction part, and not repeating and summarizing a bunch of stuff that's well stated elsewhere in the article. This is change set 5. I took out the distinction between a sampling half and a reconstruction half, because I didn't think the theorem really had such two halves. I kept the parts, but characterized more explicitly what they are. Save 1 KB or so, according to the too-long-article complaint. Dicklyon 22:36, 5 August 2006 (UTC)

## Concise summary section

I finally figured out what the "concise summary" section was about. It wasn't clear that it was a summary of the preceding section on mathematical proof, since it didn't say so. Having worked on that section now, the relationship became clear, so I changed the title to refer to it (or close, anyway). I changed some wording too, as it erroneously said the comb could have an FT; that may have been my own mistake, but now it's fixed. Dicklyon 23:15, 5 August 2006 (UTC)

In its original form (14-June), it was more than that. But the reason for its creation was to restore the highlights of the proofs deleted by Rbj on 26-May. Those proofs avoided use of the Dirac comb as a model of the sampling process (because it isn't!). And one of them (Shannon's) completely avoids any use of Delta functions. --Bob K 05:27, 10 August 2006 (UTC)
Sorry, Bob, I can't make any sense of that version. It's not clear what it means by Xs(f) if not the spectrum of the modulated impulse train. The fact that it can be calculated via the DTFT doesn't explain what it means or how it relates to the concepts in the proof. The introduction of the DFT into the discussion further confuses things. At least Rbj's writeup is about math, not computing. Thanks for reminding me of why I did what I did on 16 June. Dicklyon 05:38, 10 August 2006 (UTC)
Well of course it was not my preference either. I obviously preferred (and still do) my 25-May version of the "mathematical basis". --Bob K 06:01, 10 August 2006 (UTC)
Yes, it's not bad. But as it says, it's sort of backwards, going from the result you want to show back to how one might represent the samples as a train of impulses. Is it more concise and easier to follow than what we have now? Probably so; the opening bit we have now is still too verbose. I still feel we're missing a better way to express that would be clear, concise, and easy to follow. I had to make one for a talk once, but that was over 30 years ago, and I can't say I remember which approach I adopted at the time. Dicklyon 06:42, 10 August 2006 (UTC)

### Shannon's version

I've put back Shannon's proof from a much earlier version, with appropriate changes like f_H to B. It could still use some more work. I'll reread Shannon's book first. Dicklyon 05:59, 10 August 2006 (UTC)
The Shannon and Weaver book just refers to the paper for the proof, so I went there. As it turns out, I like that version a lot, so I just copied it verbatim, including Shannon's variable names, as a quote. I think this works better, since it becomes a puzzle for the reader to try to understand Shannon, rather than a temptation to rewrite his elegant work. Dicklyon 06:19, 12 August 2006 (UTC)
Interesting. But let me play devil's advocate. It works as you say for some of us, who can handle the terseness, the symbolic changes, the switch to radian frequency, and the mixture of units (rad/sec and cycles/sec). Plus we had the benefit of seeing the deciphered version. And it seems remiss not to close the loop... i.e., point out that the periodic function produced by the Fourier series expansion is the DTFT of the samples. Why stop here? Why not also provide a more accessible translation (since we already have it) of Shannon's logic? Puzzles are not for everybody. --Bob K 11:30, 12 August 2006 (UTC)
Yes, you're right, and I expected to hear from you. Puzzles are not for everyone. At this point, however, the article is quite long, and I felt there was more value in seeing Shannon's terse proof than in seeing an alternative long detailed development. I think that it might be interesting to see the proof expanded, rewritten in consistent notation, and loop close on DTFT; however, that is also not for everyone, since 99% of readers are going to skin over these proofs altogether, rather than attempt to understand them. Since we have the other one done out in detail, is that enough? If we also do this one, should we keep it all in this aricle, or make a subsidiary article or two about proofs? It's a tough call. At this point, I'll sit back and see what others want to do. Dicklyon 16:39, 12 August 2006 (UTC)
Glad I didn't disappoint. It is a tough call. The problem, IMO, is the "policy" (I guess that's what it is) against articles such as Single-sideband_modulation/Proofs. I don't understand why that is supposedly a bad thing. And if it is, why does that article still exist? If the policy is not really a policy, maybe that's really the best way to manage growth without discarding good information. --Bob K 17:11, 12 August 2006 (UTC)

## Introduction rewrite

I my view the start of the article obscures the interesting parts by talking about all possible origiators. When I read an article, I want to get to the topic fast (here: the theorem) and if I am insterested I can read about how the topic came about later. Here is a proposed rewrite of the introduction, i.e., the text which starts the article:

The Nyquist–Shannon sampling theorem is a fundamental result in the field of information theory, in particular telecommunications and signal processing. <- OK, gives a context to the theorem.

The theorem is commonly called Shannon's sampling theorem, and is also known as Nyquist–Shannon–Kotelnikov, Whittaker–Shannon–Kotelnikov, Whittaker–Nyquist–Kotelnikov–Shannon, WKS, etc., sampling theorem, as well as the Cardinal Theorem of Interpolation Theory. <- OK, explains that the theorem is known under several names. Maybe more redirects are needed? In addition to its mathematical originator E. T. Whittaker, and its American engineering originators Claude Shannon and Harry Nyquist, it is also attributed to the Russian engineering originator V. A. Kotelnikov and sometimes to its German engineering originators Karl Küpfmüller and H. Raabe, or its Japanese originator I. Someya. J. M. Whittaker developed it further and called it the Cardinal theorem. <- This sentece is just a repeat of the one before which already implicitly suggests that there are several originators. This does not have to be explained here. Better to just refer to the historical section. It is often referred to as simply the sampling theorem. See the historical background section.

Sampling is the process of converting a signal (e.g., a function of continuous time or space) into a numeric sequence (a function of discrete time or space). <- OK, gives context in terms of sampling. A signal that is bandlimited is constrained in terms of how rapidly it changes in time, and therefore how much detail it can convey, in between discrete instants of time. <- Moved the first sentence of the "Introduction" section to here. Gives context in terms of "bandlimited" and also an intuitive explanation of the theorem. The theorem states conditions under which the samples [represent no loss of information and] can [therefore] be used to reconstruct the original signal [with arbitrarily good fidelity]. <- By deleting the indicated words the sentece not only becomes shorter, but also less complicated to a reader who has no clear understanding of "information" and "arbitrary good fidelity". Note that to a non-technical/math reader the idea of "reconstructing the original signal" should have the same interpretation as "reconstruct the original signal with arbitrarily good fidelity". ("exact reconstruction" should also be understood by most readers) More precisely, it states that [unambiguous] reconstruction is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth. Remove "unambiguous" which to me has an ambiguous interpreation. Some fomulations of the theorem also include the procedure for how the reconstruction is made, see below. <- Added the last sentence to explain that the theorem sometimes only gives the statment about the sampling frequency, sometimes presents the entire process of sampling and reconstruction. The last idea is consistent with the rest of the article and appears to be established from Meijering who is cited further down in the article.

This gives:

The Nyquist–Shannon sampling theorem is a fundamental result in the field of information theory, in particular telecommunications and signal processing. The theorem is commonly called Shannon's sampling theorem, and is also known as Nyquist–Shannon–Kotelnikov, Whittaker–Shannon–Kotelnikov, Whittaker–Nyquist–Kotelnikov–Shannon, WKS, etc., sampling theorem, as well as the Cardinal Theorem of Interpolation Theory. It is often referred to as simply the sampling theorem. See the historical background section.

Sampling is the process of converting a signal (e.g., a function of continuous time or space) into a numeric sequence (a function of discrete time or space). A signal that is bandlimited is constrained in terms of how rapidly it changes in time, and therefore how much detail it can convey, in between discrete instants of time. The theorem states conditions under which the samples can be used to reconstruct the original signal. More precisely, it states that reconstruction is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth. Some fomulations of the theorem also include the procedure for how the reconstruction is made, see below.

--KYN 22:20, 10 August 2006 (UTC)

Go for it. I suggest tightening up the second paragraph even more:

Sampling is the process of converting a signal (for example, a function of continuous time or space) into a numeric sequence (a function of discrete time or space). The theorem states that exact reconstruction of a continuous signal from its samples is possible if the signal is bandlimited and the sampling frequency is greater than twice the signal bandwidth. The theorem also leads to an effective reconstruction formula.

Dicklyon 06:27, 12 August 2006 (UTC)

## Welcome back, Rbj

R, welcome back. I never did get feedback from you on the substance of my changes, and still interested if you have particulars to mention.

Your current edits are coming in much bigger batch than even I am used to. A lot to digest. Mostly looks OK, but I have my quibbles, I'm sure you won't be surprised to hear. For starters, what does it mean "The theorem also constructs an effective reconstruction formula." How does a theorem construct anything?

there is such a thing as a constructive proof as opposed to an existence proof. the former is stronger and, at least for this electrical engineer, more straight forward. it proves that something exists by explicitly constructing that something.
in a constructive proof, the theorem doesn't do the constructing. Dicklyon 17:49, 16 August 2006 (UTC)
try putting that into constructive proof and see how long that sticks before reversion. r b-j 20:19, 16 August 2006 (UTC)

The theorem as stated by Shannon leaves the reconstruction implicit, from the FT. So I had said it "leads to". Wouldn't that be better than overloading the theorem with more than its simple self? Dicklyon 05:09, 16 August 2006 (UTC)

the Sampling Theorem as put forth in nearly any EE text on the subject, does not use Shannon's original proof, but one much more like the "mathematical basis" proof given herein. two differences: they don't put the T scaling factor in the sampling operator like User:BobK and i do (i've been harping about this scaling and dimensionality problem since November 1988 - if you can get to a library and look up the Journal of the Audio Engineering Society for that month, you'll see). the other difference is that rather than determine the Fourier series of the sampling function (Dirac comb) and use the frequency shifting property of the Fourier transform, they instead use the F.T. of the sampling fuction (another Dirac comb) and then use convolution in the frequency domain to show that the original spectrum is shifted and repeated at all multiples of the sampling frequency. i am convinced that my way is simpler that that is salient for an encyclopedia article.
The approach is fine. I just want to see the details right. How you want to handle the scale factors has never been an issue with me. Dicklyon 17:49, 16 August 2006 (UTC)
the details are right, unless you want to take the position of User:LutzL and assert that it is too sloppy to say:
${\displaystyle \sum _{n=-\infty }^{\infty }\delta (t-n)=\sum _{k=-\infty }^{\infty }e^{i2\pi kt}\ }$
or
${\displaystyle {\mathcal {F}}\left\{e^{i2\pi f_{0}t}\right\}=\delta (f-f_{0})\ }$
Those are fine; those aren't the details I'm complaining about. My complaints are explicitly detailed in these comments. It's the logic that's at issue, not the equations. Dicklyon 21:36, 16 August 2006 (UTC)
but i think you said that you agree we can rest our math on such expressions. (User:LutzL does not, i believe, and if you insist on L2 neither do you.) if you accept these two expressions of "fact", then the details i put forth are all correct (save for any typos, but i think i took care of them). r b-j 20:19, 16 August 2006 (UTC)

Here's a nit: "and asserts the necessary conditions that insure that such reconstruction can happen" should really say the sufficient conditions. The bandlimit as stated in the theorem is sufficient, but not always necessary, for exact reconstruction. Dicklyon 05:13, 16 August 2006 (UTC)

it's both necessary and sufficient. i thought "sufficient" was explied with the reconstruction half. change it if you want.
"Sufficient" is implied, but might as well be said. "Necessary" is not true. It is quite possible to perfectly reconstruct signals that are sampled at a rate lower than twice their highest frequency, given suitable other conditions. The conditions of the sampling theorem are not necessary for exact reconstruction. Until you understand that, arguing about the details of mathematical implication will remain futile. Dicklyon 17:49, 16 August 2006 (UTC)
when someone reconstructs x(t) by use of the 1-to-1 (i think bijective might be the formal term) property of the F.T. from the samples x[n], that, in a constructive manner, establishes "sufficient".

I see you're racing ahead with putting everything back your way. Weren't you the guy who kept saying some discussion would be good? Seems like we haven't heard from you since then. Dicklyon 05:24, 16 August 2006 (UTC)

i said discussion is good before making wholesale changes. this is undoing that. if you think some major change to this is justified, the onus is upon you. you get to say, point-by-point, what it is you think would be better if changed and the justification for that and we get to say if we're convinced or not. some things you did i tried to keep (like x[n] to BobK's dismay).

You changed a sentence to a fragment when you started it with "Assuming that the continuous signal..." Dicklyon 05:26, 16 August 2006 (UTC)

You've re-introduced the unnecessary concept of "components" where you wrote "A signal is bandlimited if it contains no frequency components of frequency higher than some bandlimit or bandwidth B." This condition is not sufficient to make a signal bandlimited, and doesn't agree with the math formalization that follows. The concept of components only muddies the water when you have continuous spectra. Dicklyon 05:29, 16 August 2006 (UTC)

frequency "components" is not unnecessary if we're sampling signals that are more general than a single sinusoid.
a "component" is a sinusoid, i.e. a discrete frequency. The theorem should be stated and proven for more general continuous spectra. Dicklyon 17:49, 16 August 2006 (UTC)
that's your definition. "component" is more general than that. it doesn't have to be Fourier Series or a collection of discrete sinusoids (with δ(f-f0) in the frequency domain) in the ordinary use of the continuous Fourier Transform:
${\displaystyle x(t)=\int _{-\infty }^{\infty }X(f)e^{i2\pi ft}\,df}$
${\displaystyle X(f)e^{i2\pi ft}\ }$ is a frequency component of x(t). or maybe i should say that ${\displaystyle X(f)e^{i2\pi ft}df\ }$ is. r b-j 20:19, 16 August 2006 (UTC)
Maybe, but which is it? Do you have a definition anywhere that you could share? Dicklyon 21:37, 16 August 2006 (UTC)

You missed a factor of two in "upper bound for frequency components, B<fs, of the signal to allow for perfect reconstruction. This upper bound is the Nyquist frequency." Dicklyon 05:37, 16 August 2006 (UTC)

it probably was correct in the original. i can't find where you mean. fix it if you like.
search for some words I quoted, or look at your first diff. You'll find it. I'm not planning to clean up after you unless I get some support from others. Dicklyon 17:49, 16 August 2006 (UTC)
it's fixed and was a legitimate typo you spotted. thanks. r b-j 20:19, 16 August 2006 (UTC)

I presume you never actually read my edits or tried to understand the point of them.

no, only about a dozen times when my disgust was intolerable and i felt compelled to act. i didn't want to do it until the article was seemingly stable in case you were still in process of making changes. i waited until the article fell off the edge on my watchlist. i was not gone (as my contribs will tell). just lurking.

For example, where you've brought back the long-winded proof of two imagined half theorems, the latter relying on: "With that condition satisfied, there is no overlap of images in Xs(f) and X(f) (and thus x(t)) can be reconstructed from Xs(f) (or xs(t)) by low pass filtering out all of the images of X(f) in Xs(f) except for the original image at the baseband. To do that, fs > 2B (to prevent overlap)..." in which you appeal to the intuitive notion of "overlap" without even an illustration,

feel free to make an illustration. i had been trying to learn both Dia and Inkscape (the only PC graphics programs i can get my hands on for free) and am having trouble just duping an image and translating to to another location. i wanted to replace the drawing bandlimitedB with one that has "tails" so that when the tails overlap, aliasing can be illustrated by adding the tails and reforming another spectrum that would be properly bandlimited and have the same repeated spectrum (an "alias"). r b-j 16:02, 16 August 2006 (UTC)
i'm working on some drawings. both Dia and Inkscape are really unintuitive. i can get Dia to do some things and Inkscape to do others. the result is an SVG file. there will be a redrawing of the bandlimitedB drawing so that the shape of the original spectrum is consistant. r b-j 16:49, 16 August 2006 (UTC)

instead of the simple mathematical condition that was so much clearer and more concise as I had it, and then "and the frequency response of the reconstruction filter H(f) must be: ..." where you state a condition on the filter that is NOT implied by the math, and NOT necessary for the given parameters. Oh well, I guess we have to watch you go through this process... Dicklyon 05:44, 16 August 2006 (UTC)

the steps of the "long-winded" proof that you deleted are easily defended. these are just bits and pixels, why take out a step that causes someone to have to get a piece of paper to check something out? r b-j 06:20, 16 August 2006 (UTC)
the reason I took them out was as discussed for several days before I did so (around 3 Aug in the talk history). Just too many micro steps, making for a very large section for something that ought to be more concise. I'm now OK backing off on that, leaving your section verbose, and juxtposing Shannon's brief proof. But still let's try to get the details to be correct. Dicklyon 17:49, 16 August 2006 (UTC)

Your use of iff in the previously true statement that now reads "This reconstructed signal will match the original signal iff the original signal contains no frequencies equal to or above half the sampling frequency" displays a fundamental lack of grasp of math or the theorem, or something. It is quite possible to get exact reconstruction when the signal contains a frequency component at half the sampling frequency; it just has to be phased right. It's best not to screw up the math by forgetting what it says is true, and adding stuff that it doesn't say. Dicklyon 05:58, 16 August 2006 (UTC)

no, Dick, it's demonstrative of your lack of understanding of mathematical rigor. you take a serious analysis course (yeah, i know you will find there that the Dirac delta is not a real function, but that is another fight between EEs and mathematicians that should be fought somewhere else) and you get a lot of limit proofs where there is "given an ε>0 find a δ>0" so that some condition is satisfied. now you don't get to choose the ε, the devil hands it to you and you have to deal with it, or your proof is not proven. just because you can find a Nyquist component (essentially a cosine with the peaks lined up perfectly with the sampling instants) that can be sampled and exactly reconstructed, you don't have the luxury of making that choice. the phase of the Nyquist component is chosen by your worst adversary and if you cannot exactly reconstruct, you have failed to reconstruct. the statement i made is precisely correct. r b-j 16:02, 16 August 2006 (UTC)
You're very confused here. Changing the "iff" back to "if" does not say that one can in general reconstruct a signal when frequencies equal to the critical frequency are present. It just removes the converse that you can NOT reconstruct a signal when such a component is present. That converse is usually true, but is not always true, and is not implied by the theorem, nor is it provable, because there exist signals for which the reconstruction IS exact. It's simple logic, really. Dicklyon 17:49, 16 August 2006 (UTC)
you're the confused. when such a component is present (at Nyquist), you don't have enough information, in the samples, to reconstruct it. r b-j 20:19, 16 August 2006 (UTC)

I see that by "chaff" you meant the statement of assumptions on which the math depends. OK, we can live with that. Oh, and the simplified way of finding the necessary conditions on a reconstruction filter, which lead to the criterion that the proof relies on. Oh, well, it's just a mathematical basis, not a proof. Dicklyon 06:05, 16 August 2006 (UTC)

there is nothing about stocastic process that the math depends on to prove the sampling theorem. we need to show what happens to the spectrum when it is sampled and then what conditions are needed to prevent aliasing (2B<fs) and then how the original x(t) is reconstructed from the samples. r b-j 16:02, 16 August 2006 (UTC)
You can say what happens to the spectral density when the process is sampled, but that's about as far as you can go with that approach; you can appeal to limits of finite records, of course. Dicklyon 17:49, 16 August 2006 (UTC)
the theorem is not about finite records. it's about discrete-"time" sampling. x[n] is no finite record. r b-j 20:19, 16 August 2006 (UTC)
listen Dick, my DSL connection cannot keep up with your internet connect. Edit conflicts. i will wait a day before answering any more. r b-j 06:20, 16 August 2006 (UTC)
OK, sleep on it. While you're at it, please think about what the theorem actually says, so that maybe we can get the implications in the right order. For example, where you say "It says that the sampling frequency, fs, must be strictly greater than twice the bandwidth, B, of the continuous-time signal, x(t), for no information to be lost (or "aliased")" one has to wonder where you got that.
i proved it. information is lost when the images overlap before they are added. there are a zillion possibilities of how that resulting sum could have been so overlapping and adding images loses information. pretty simple, but a diagram would be useful.
I don't follow that style of proof. I know exactly what you mean, I think, which is that the constructive reconstruction approach fails when there is spectral overlap. That doesn't imply that information is lost or that some other technique might not recover it. Usually it is true, I agree, but the sampling theorem doesn't say so, and you don't have a proof for it, because it is not true in general. Dicklyon 17:49, 16 August 2006 (UTC)
fine, i think that most people understand that knowing the sum a+b is less information than knowing both a and b separately. i'll make it explicit in a drawing. i think though yours and BobK's discussion of what happens when you do not sufficiently sample maybe should go into an ancillary article. this will get long. maybe not, it's just a couple or three drawings that will look a lot like what you'll get out of a good EE textbook on the subject. r b-j 20:19, 16 August 2006 (UTC)
The converse of the sampling theorem is not true in general. It is quite possible to have a signal of bandwidth B greater than the Nyquist frequency of a sampling system and still have no information loss, and still have perfect reconstruction;
only if there are holes in the spectrum where those images can go. that is not appropriate for an encyclopediac article introducing someone who doesn't know the sampling theorem to understand it. this is not an IEEE Transaction or some graduate level textbook.
Right, and you don't have to get into that at all if you just prove the theorem, and don't pretend that its converse is also true. In general when discussing a theorem, it is wise to avoid the question of whether the converse is true, because it gets complicated. So don't say so and we don't have a problem. Dicklyon 17:49, 16 August 2006 (UTC)
you also have to know apriori where those holes are to guarantee that the images slide nicely into them. this is all a much more advanced topic than the fundamentals of sampling and the mathematical basis for why it works. this is an encylopedia, not an academic paper nor a graduate level text. r b-j 20:19, 16 August 2006 (UTC)
the set of possible conditions to allow perfect reconstruction is infintely larger, not just the one condition for which Shannon proved his theorem. A small subset of such conditions is represented in the so-called "Undersampling" section. And having aliasing does not imply information loss, in general, as that section should make clear.
if there is uncontrolled overlapping of images, certainly information is lost. when i know the sum c = a+b and neither a nor b, i have less information than knowing a and b. adding two numbers and throwing away the information of what those two numbers are, yet retaining the sum, loses information.
I already responded to this handwave above. Dicklyon 17:49, 16 August 2006 (UTC)
it's not a handwave. but a picture or three will help. r b-j 20:19, 16 August 2006 (UTC)
Even all my careful editing to make true statements about the critical frequency has gone out the window with your changes.
1. this usage of the Dirac impulse that are in dozens of EE textbooks is well established despite the objections that pure mathematicians bring. even if Lebesgue integration says "no", we EEs say that there is a function that is zero almost everywhere, yet has integral of 1. this article is not the place to confuse people with that difference of usage.
I have long since accepted the Dirac deltas as sampling functions and as transforms of sinusoids. Did I ever have a problem with that? If so, long before any of these edits we're talking about. Dicklyon 17:49, 16 August 2006 (UTC)
when you brought up this L2 stuff, you essentially had a problem with it. r b-j 20:19, 16 August 2006 (UTC)
2. your insistence to bring in stocastic processes and some attempt to hedge the sampling theorem with some language regarding it (and to yank out the Fourier transform at the beginning yet include precisely the same thing in integral form without calling it the "Fourier transform" offers no help and only confuses. your stocastic process is essentially a sample space of which each element of that space is, after it is chosen and yanked out, simply a signal that has a Fourier transform and has to be bandlimited to B. none of that language adds anything useful in an encyclopedia article about the sampling theorem and only serves to confuse.
I backed off this a lot, too. I just want the math to indicate what scope it applies to, so we can leave the option of also proving the theorem for stationary processes. I took out the Fourier transform, and put the explicit integral that needs to be zero for f exceeding B, as as attempt to state the conditions of the theorem in a way that was not biased against statinary processes, where the same integral for f below B does not exist, and therefore the FT does not exist.
what does that gain. you still needed a a trial or outcome of your stochastic space to represent x(t). that x(t) is then no different than any other x(t) that we can pass to the Fourier transform. what you did made no sense whatsoever: ("let's use the Fourier transform explicitly but let's also be sure not to call it the 'Fourier transform'.") r b-j 20:19, 16 August 2006 (UTC)
If instead you start by stating the conditions to which the proof applies, and use the FT only when it exists, that's an acceptable alternative; but you took out that, too. And when you take a sample function of a stationary process, that function does NOT have an FT. It's got infinite energy at any frequency where it has nonzero power spectral density,
so do sinusoids, yet we F.T. the bastards anyway.
not in the form of sinusoids that have dirac deltas as transform.
untrue. it's the same thing once you yank your sample or outcome or whatever it is out of the sample space. now it turns out that if you know a few statistical properties of your stationary process (like some Markovian joint probabilities such as p{x(t1), x(t2)} or p{x(t)|x(t-τ)} = func(τ), then you can construct an autocorrelation and a power spectrum (but still never have knowledge of phase, so you are still missing information you need to make x(t) deterministic), but every sample of x(t) has the same requirements for sampling that a deterministic x(t) has. bringing all this crap into an article introducing sampling and reconstruction to people who know F.T. but not sampling just bloats and confuses. r b-j 20:19, 16 August 2006 (UTC)
It's very easy to write about your x(t) and its FT in way that is correct and does not over-extend the claims to areas that are just wrong. Might as well try that. Dicklyon 17:49, 16 August 2006 (UTC)
nothing i wrote is "just wrong" unless you want to take LutzL's position and say i can't express the Dirac comb as a Fourier series (and then use the frequency shifting property). many things you put in are not about the fundamental sampling theorem (but the historical stuff is nice) and serve only to complicate and some things you took out causes reader to "skip a step" or two in the mathematical process. r b-j 20:19, 16 August 2006 (UTC)
Decide what the theorem is, first, then write about it in a way that doesn't have mathematical inconsistencies, and you'll be fine. I'll stop worrying about the longwinded bloat, but let's try to at least make it correct. Dicklyon 06:42, 16 August 2006 (UTC)
Dick, your "careful editing" craps it up. you're trying to bring in stuff that isn't even about the sampling theorem into it. you delete relevant material crippling the rigor in the proof. i'm trying to return it to something that you might see in an EE textbook (with those two exceptions i mentioned above, the T scaling and using the frequency shifting property of the F.T. rather than convolution in the frequency domain). you need to get a good textbook (i recommend O&S, but there are others) and keep the article to the concepts therein. r b-j 16:02, 16 August 2006 (UTC)
You defecatory characterization could easily be applied from the alternate POV.
doesn't make it correct.
By the way, Al Oppenheim and Ron Shafer are friends of mine; I read their first edition cover-to-cover when it came out in 1975, and am not ignorant of the many other excellent texts in this field. In examining them, however, I do find an appalling lack of rigor in some, to the extent of being incorrect. For example my friend Bob Otnes co-authored Digital Time Series Analysis, (Otnes and Enochson, 1972) which covers stationary processes the way you suggest: They start by assuming that the signal x(t) is a stationary random process whose power spectral density exists and is a zero outside the bandwidth B; this is good. But then they say in their "heuistic (plausible but incomplete) proof" that this ensures that the Fourier transform of x exists!!!
of course it doesn't because x(t) doesn't exist until you yank one of them out of the sample space. but once you do, then the F.T. does exist. however, there is a bit of rigor to go through to show that if the power spectrum is zero outside of [-B, B], then all outcomes, x(t), of the sample space {x} must also be at least as bandlimited to the same extent. this article should not go there at all! whether x(t) is deterministic or pulled out of a sample space, we should WP:KISS it and just require at the outset that it's bandlimited to B.
So, you're not alone in glossing over the truth. Dicklyon 17:49, 16 August 2006 (UTC)
i'm just trying to separate, pedagogically, what it is we need to teach the newbie and what it is that only confuses or unnecessarily piles on. i am approching this nearly the same way as one would do it in an undergraduate "Signals and Systems" or "Linear System Theory" course. i am trying to pick up on the concerns that User:Metacomet and User:Omegatron and User:Light current have above. particularly Wikipedia:Make technical articles accessible. that plus sufficient rigor so no one can accuse us of skipping salient steps is my intent. r b-j 20:19, 16 August 2006 (UTC)
I support those goals, but maybe have a different viewpoint on what steps are salient and on what amount of rigor is sufficient. But I'll be very flexible on those if we can just avoid saying things that are untrue, like the various converse statements that are sprinkled throughout. Dicklyon 21:24, 16 August 2006 (UTC)
It is my intention to not mess with this article unless someone supports my efforts. I'd like to see it fixed, but I don't want another 1-on-1 edit war with you, as that does nobody any good. Dicklyon 17:49, 16 August 2006 (UTC)
hooray! r b-j 20:19, 16 August 2006 (UTC)
I can understand how you would feel that way after a marathon session of debate. But after the euphoria wears off, I would like you to consider the possibility that victory by attrition, though good for your ego, is not in the best interest of the encyclopedia. Alas, decision-making by assertiveness seems to be a serious weakness in the Wikipedia process model. --Bob K 23:13, 16 August 2006 (UTC)
it's push-back Bob. i waited until i was confident that Dick had his say before responding (> 3 days of inactivity). doing it interactively was wasting my time (since he was just pushing ahead despite objections). this was not how it was going before, even if you and i or i and LutzL or whoever were disagreeing. let's keep the encyclopediac goal of trying to convey some information, maybe even teach somebody something they didn't know before. especially with a couple more images, this will resemble the normal textbook treatment (with some history and other topics added in other sections) more now than before yesterday. and the two things i can think of that are not in the textbooks (the T scaling factor of the sampling operator and the use of Fourier series of the sampling operator and frequency-shifting property of F.T.) is better, simpler to understand than convolution in the frequency domain. other than that, it's like the textbooks: multiply by Dirac comb, show that this causes repeating and shifting of the baseband to all multiples of the sampling frequency, recover the original by ideal low-pass filtering of that result, and derive what that means in the time domain. the other proofs (Shannon's, the PSF proof) are nice, but this is the one done in the textbooks (except for those two differences). i am still pointing to Wikipedia:Make technical articles accessible and most of the comments above (before any of us, other than LutzL, came on the scene) as impetus/evidence of concern over such a direction for the article. r b-j 23:43, 16 August 2006 (UTC)

## Any help to fix the article?

If anyone wants to help fix any of the points mentioned above about how Rbj is putting back a bunch of errors into the math, please say so. I'll be happy to help on any point for which there is support, and will leave it alone otherwise. Dicklyon 17:49, 16 August 2006 (UTC)

## Converse of the sampling theorem

The sampling theorem is very often misstated as its converse, e.g. at National Instruments:

"The Nyquist theorem states that a signal must be sampled at a rate greater than twice the highest frequency component of interest in the signal to capture the highest frequency component of interest; otherwise, the high-frequency content will alias at a frequency inside the spectrum of interest (pass-band)."

This is actually NOT what the theorem says, but is a common over-interpretation that has even crept into our article in a number of places. Anyone who fails to appreciate the difference should study it a bit more, or feel free to ask for help.

They also mess up the terminology with their own confusion:

"Note: The definition of Nyquist Frequency is not consistent in the measurement world. It is sometimes being used to describe the sampling rate in the theorem and other times it is used to describe the highest frequency component in the theorem. In this tutorial we will use Nyquist Frequency to describe the highest frequency component allowed to avoid Aliasing for a given sampling frequency."

None of these interpretations are actually correct. The Nyquist frequency is half the sampling frequency, which is NOT "the highest frequency component allowed to avoid Aliasing".

actually we talked about this some time ago in comp.dsp [14] and, to my chagrin, i discovered some texts that did define the Nyquist frequency to be 1/2 the Nyquist rate (O&S is one such text, such a shame). (that is what we call "B".) but i don't think such usage is currently common and my hope that it dies a death of disuse and that the Nyquist freqency will forevermore be unambiguously understood to be fs/2 . r b-j 22:06, 16 August 2006 (UTC)

But they don't understand what the theorem says well enough to state such fine details correctly.

so your friends O&S, don't understand the fine details correctly? r b-j 22:06, 16 August 2006 (UTC)
That's a fair question. Am I being too hard on you, or are you essentially as good as Oppenheim and Shafer?
dunno how you extrapolated that. i am not name-dropping nor trying to establish authority by association (i'll stand on my own experience and knowledge of both the concepts and of the pedagogy). it's just that you made a clear point that they are your friends when i suggested to refer to "O&S" for the normative treatment of the sampling theorem in an EE text. it's also that just after the name-dropping, you mentioned a peeve (which i also share - i think that the Nyquist frequency should always be simply defined to be fs/2 and never B) that i remembered existed in some textbooks and when i looked it up, O&S (1989) was one of them (a shame, to be honest).
So, I pulled the latest edition and read the chapter "Sampling of Continuous-Time Signals". Conclusion: if we had their version, I wouldn't have much to complain about. Three quibbles: first, they never say what class of signals their proof applies to; they just assume the existence of the FT, just as you like to do.
surprize, surprize!
I'll ask them to fix that in the next edition.
i'm sure they'll do it because you ask.
Second, in one paragraph, the description of the figure showing spectral overlap could be interpreted, if a sentence was taken out on its own, as a claim for the truth of the converse of the sampling theorem; I'd ask them to clarify that it describes the depicted situation, but may not be a true statement more generally.
dunno what the big deal is. the historical sampling theorem requires that fs > 2B. if the condition is not satisfied, something bad happens, accurate reconstruction is not guaranteed, and showing what does happen (that is, aliasing) is not off-topic.
Not a big deal at all. But in your treatment, the statements are converse are clear, unneeded, and incorrect. It's OK to illustrate the bad that usually happens, but not OK to claim that the theorem implies that something bad MUST happen. Dicklyon 08:11, 17 August 2006 (UTC)
Third, they say that "Nyquist frequency" is the name of the band limit itself, as opposed to fs/2; I think they were asleep at the switch in the editing. That's all.
they do it also for the 1989 version and i think in the 1975 version. there are other books of other authors making the same "mistake" (it's actually a difference in traditional definition, but i think we both agree that the traditions are not of equal value.) i don't think it's a copy editing mistake. i think, at one time, that is the definition they meant. maybe you can get them to change that. that would actually be useful.
OK, I'll work on that, too. Dicklyon 08:11, 17 August 2006 (UTC)
Everywhere else they use the correct "if" without also claiming the converse.
it's a pretty big book, filled with lot'sa good stuff. "Everywhere else" is a sweeping generalization. Sweeping generalizations are hard to defend. just one counter-example will pull the legs out from under it.
Everywhere else in the chapter that I said I read; I checked pretty carefully. Dicklyon 08:11, 17 August 2006 (UTC)
They use fewer lines of equations than you do, by about 20%, to arrive at the same place.
they get to refer to derivations in other parts of the book (or in other books). Wikipedia should, in the ideal, be reasonably self-contained. but stuff should not be repeated. e.g. i didn't "derive" (quotes used in deference to LutzL who says it ain't true) the coefficients for the Fourier series of the Dirac comb because the Dirac comb article does it. where in Wikipedia do we show that the rect() and sinc() are a Fourier transform pair? what's a newbie supposed to think when the article just states it without proof? take it on faith? certainly not every physical/mathematical fact can be proved in Wikipedia, but this is a reasonably simple and short one and it may save some stress for someone who would read this without the proof. it's just bits and pixels. people who don't need it can skip over any steps that are there. people who do need it can't read the steps if they aren't there.
If I may borrow your expression, I think it really craps up the real proof, when you embed the transform pair proof within it. I agree that the transform pair should be proved somewhere, for the reasons you gave, but this is not the right place, because that transform is much more widely used. --Bob K 07:23, 17 August 2006 (UTC)
If the sinc articles doesn't explain its FT, that would be an ideal place for it. Dicklyon 08:11, 17 August 2006 (UTC)
They cover reconstruction without calling it half of the sampling theorem.
so Wikipedia is supposed to be a verbatim quote? this is not unreasonable. first "half" or "part" proves that the condition will lose no information in the frequency domain (and thus the time domain since the F.T. is virtually one-to-one) and second "half" proves explicitly how the samples are combined to get the original x(t) back. it's the "Nyquist-Shannon Sampling and Reconstruction theorem, actually. the article just makes it clear.
No, of course it shouldn't be a quote. But when the theorem is well known and well defined, the wikipedia shouldn't tack on a second half and call it part of the theorem. That's not what Shannon does, not what O&S do, etc. The reconstruction formula has its own article already, even, so you can say a few words about it and refer to it. Dicklyon 08:11, 17 August 2006 (UTC)
They specify a more flexible range of cutoff frequencies for the lowpass filter that will reconstruct the signal, rather than claiming that it MUST be fs/2.
oookaaay. i'll change that. but if you push the limit and put B right up against fs/2 (which you can, theoretically) then that spec for the brickwall filter remains. remember, within the conditions that make a theorem valid, you must be prepared for the devil to hand you a possible situation and be able to deal with it. if you let the cutoff frequency of the reconstruction LPF be lower than fs/2, the devil can hand you a B that is higher and then you're screwed.
Yes, and O&S did a very good job of explaining what MUST be true and what is typically taken for convenience. I had that in there before very clearly too. Now it feels like it's awkwardly tacked on, but maybe someone will smooth it out some day. Dicklyon 08:11, 17 August 2006 (UTC)
They don't divert the reader's attention by a long derivation of the sinc as the impulse response of the rect filter.
they derive it in Example 2.17 on p. 49 of my 1989 version. then they refer to it.
Right, but not in the sampling section. We could do that, too. Dicklyon 08:11, 17 August 2006 (UTC)
And they use precise clear language.
i'm not opposed to that. that's why i have been objecting to the wholesale changes you were making. not every change, but it was like a dumptruck running over the article.
Yet you never made a single specific comment about any of the language changes I put in. You just take them out, also without comment other than the dumptruck generalization. I'm not really familiar with that particular language metaphor, by the way; perhaps you can explain. Dicklyon 08:11, 17 August 2006 (UTC)
Generally, they get right what you got wrong.
asserted and not supported. you can be satisfied with your association with authority. it's not the same as authority.
The support is all in the discussion above, of your specific logical errors. Dicklyon 08:11, 17 August 2006 (UTC)
No big surprise, as they've been continuously improving it over the years with the help of readers and students.
like me.
No, I don't.
We should try that here. Dicklyon 04:11, 17 August 2006 (UTC)
again, i continue to maintain that the "standard" or "normative" method that we teach the sampling theorem to undergrads that have either had a course in Linear System Theory (sometimes called "Signals and Systems") with continuous-time emphasis or in the latter half of such a course (depends on if you're doing the Van Valkenburg pedagogy of "teach digital first" or not), is that we start with the possibly bandlimited continuous-time signal, we multiply that by the sampling operator (called "Dirac comb" here and some other places), show one way or another (my way is simpler - no convolution in the frequency domain needed) that this causes the original spectrum to be shifted by all integer multiples of the sampling frequency and added (the root of possible aliasing) and iff there is no overlapping of the images, the baseband image can be recovered simply by brick-wall LPFing and then show in the time-domain what that would mean. that's how O&S and virtually every other text do it and that's how this article should do it. you say that it's wordy, but if you take out any step, someone else will have to fill in the gaps in their thinking. you cannot both claim rigor and take out necessary steps. User:LutzL can claim the use of the dirac delta is not rigorous and that is an ongoing difference between pure math guys and EE guys in pedagogy. i s'pose to be truly rigorous, we should stick to Shannon's original proof or the one using Poisson summation formula and not mention dirac impulses at all. r b-j 05:45, 17 August 2006 (UTC)
The approach is fine. But being rigorous does not require stating every microstep. And the "iff" does not quite follow logically from the precise conditions stated. Close, of course, but certainly not rigorously true. Even worse in other places where you presume that not satisfying the theorem will produce loss of information, even though we include a section that illustrates how that is often not the case. Dicklyon 08:11, 17 August 2006 (UTC)

I argue that just because these things are not uncommon, does not mean we want to reproduce them in the encyclopedic wikipedia article. Yet that's exactly what has happened to the article in the last day, thanks to the aggressive re-insertion of errors by Rbj.

Dicklyon 21:47, 16 August 2006 (UTC)

"aggressive re-insertion of errors by Rbj" is again asserted but not supported. r b-j 06:49, 17 August 2006 (UTC)
The support is all above. Read and understand, or ask where you don't get it. Dicklyon 08:11, 17 August 2006 (UTC)
no, Dick, it's you who doesn't get it (pedagogy, that is). if you think your "improvements" to the article made it more compatible with they way it's done in the EE texts and they way it's taught in EE classes, you're mistaken. you say "the support is all above", but that says nothing. you have utterly failed to support your claims of "errors" other than the typos that i have acknowlegded and fixed immediately. this "converse" crap is exactly that. distracting crap. i am trying to make the mathematical treatment compatible with what is done in textbooks with the two differences i mentioned above (and justified many times) and the fact that some "micro-steps" are included rather than "left to the reader" (which is not much of a proof). there is no substative error as long as we agree about the usage of the dirac delta function (and you actually did appeal to that issue as well as the L2 distraction in a feeble attempt make your case). i am trying to keep you from crapping up the article and making it inaccessable to the typical person that could make use of it.
OK, I will take the time to carefully re-document each of remaining logical errors that you have re-inserted, since you have no idea what I mean by converse and logic. Give me time; I'll make a new section below in a day or two. By the way, I just fixed such an error that you put into aliasing last night; see if you can understand that one. Dicklyon 15:58, 17 August 2006 (UTC)
the historical stuff is fine. i don't plan to interact with that. r b-j 15:39, 17 August 2006 (UTC)
How generous of you to allow some of my work to go unmolested. Dicklyon 15:58, 17 August 2006 (UTC)
i forgot to point out yesterday that the irony of this sarcasm is pretty thick. r b-j 05:26, 18 August 2006 (UTC)

## More errors

Sorry, I'm a bit behind in pointing them out. The latest "The continuous signal varies over time (or space as in a bitmap or another independent variable)" is not only a long awkward construction, it's also nonsense. And seems to be attributed to me in the edit comment--what's that about? Have you ever heard of a bitmap having continuous (and infinite) space coordinates? Dicklyon 21:57, 16 August 2006 (UTC)

find a better term for a digitized photograph and change it. be my guest. r b-j 22:09, 16 August 2006 (UTC)

The updated "The continuous signal varies over time (or space as in a digitized image or another independent variable in some other application)" misses both points, about the awkward construction and the fact that the domain of a digitized image is not analogous to continuous time. And as I said, I'm not interested in cleaning up your mess unless I get some support for that idea. The best I could do at this point would warrant quoting you: "rv. none of that helps". Dicklyon 22:16, 16 August 2006 (UTC)

the sampling theorem does not apply only to functions of continuous-time even though it is convenient to assume that for the rest of the article. the article should point out that the sampling can be done over the (x,y) loci of points of an image, or of a probability distribution (that's how you can prove that Nth order dither decouples the first N moments of the quantization error from the signal getting quantized), or of some completely different independent variable. (i can't think of some other, but i must believe that they exist.) r b-j 22:23, 16 August 2006 (UTC)
i am too. coninuous t, or continuous (x,y) in an image to be digitized, or continuous x in a p.d.f. r b-j 23:26, 16 August 2006 (UTC)
That's good. My point was it's a continuous image to be digitized, not bitmpas or digitized images, that it applies to. Dicklyon 01:03, 17 August 2006 (UTC)

Your spectrum-images pix, and the whole concept of aliasing for that matter, don't make a lot of sense until you introduce the notion of spectrum image shifting in the mathy section. That's where the illustration is needed, to support the otherwise opaque claims, since you took out my explanation of the conditions that the lowpass filter needs to satisfy and replaced it by the overlap words. If you move the pix there, and put the aliasing section later, it might help. Oh, and the plural of spectrum is spectra. Dicklyon 04:54, 17 August 2006 (UTC)

"Schwarz" not "Schwartz", when referring to the mathematician! (Luca, 28 March 2008) —Preceding unsigned comment added by 81.118.90.5 (talk) 09:36, 28 March 2008 (UTC)

## Logic errors

Errors of logic, generally due to assuming the converse of the theorem, or to writing interpretations that would follow from the converse, remain in this article. I'll try to collect them here. Here's the first:

"It says that the sampling frequency, fs, must be strictly greater than twice the bandwidth, B, of the continuous-time signal, x(t), for no information to be lost (or 'aliased')."

Not only does the theorem not say this, but we have a whole section called Undersampling (don't blame me for the section name) that provides a detailed description of conditions under which the sampling frequency is NOT greater than twice the bandwidth (as defined w.r.t the first quote) and in which no information is lost in spite of the fact that aliasing exists. You can use the later definition of bandwidth B to make it usually true in that context, but then it says not much useful (neither necessary nor sufficient), and is still not a representation of what the sampling theorem says, which has an implication in the other order. Dicklyon 18:16, 17 August 2006 (UTC)

Rbj, you still have the inappropriate "iff" (if-and-only-if, implying that the converse is true as well as the main statement) in this sentence:

"This reconstructed signal will match the original signal iff the original signal contains no frequencies equal to or above half the sampling frequency;"

There are several problems here, if you consider the case of a signal having a discrete component with frequency exactly equal to half the sampling frequency. Formally, the reconstruction formula (which is where "this reconstructed signal" came from in that context) does not converge at all for this frequency, so the thing you're stating a result about does not exist. However, it appears that if the terms of the summation are taken in pairs extending symmetrically to plus and minus infinity (one could define the reconstruction formula that way), it can be made to converge. If you accept that, then it is easy to construct a counter-example to the "only if" part of the claim, but choosing a signal with that frequency component in a phase such that samples are taken at its peaks. That shows there exists a signal with that frequency, such that the reconstruction DOES match it exactly. This signal is a counterexample to what is stated as if it's part of the theorem. Therefore, the "only if" part is incorrect. It may be a subtle point, but in an article about a theorem, it is best to state as true only things that are true and provable, even if some authors (e.g. Leland Jackson in Digital Filters and Signal Processing) state broad converses by saying "if and only if" where they mean just "if". You're not the only one to make this mistake, but may be the only one too dense to hear it. Most textbooks get it exactly right, as Oppenheim and Shafer do. Dicklyon 05:03, 18 August 2006 (UTC)

Rbj, your latest addition is an example not so much of a logic error as a failure to find clear language due to not understanding the logic of English usage and punctuation that would make it intelligible. This sentence would rate about 6 or 8 red marks from my favorite copy editor:

"Normally what practical digital-to-analog converters output is a sequence of sample scaled rectangular pulses that can be modeled as a zero-order hold."

Let me know if you need them explained. Dicklyon 05:16, 18 August 2006 (UTC)

Your edit comment "different words. probably won't satisfy Dick" is prescient, but I must admit you made it better. Changing "the sample scaled sinc-functions" to "scaled and delayed sinc functions" solved both the missing hyphen and the extra hyphen problem; if you wanted to say "sample-scaled" that would also be correct English. Your replacement to my main complaint sentence now reads "Practical digital-to-analog converters output neither scaled and delayed sinc functions nor ideal impulses (that if ideally low-pass filtered would yield the original signal), but a sequence of scaled and delayed rectangular pulses," which is a long run-on sentence, but avoids some of the worst structural problems of the original. It would still benefit from the use of a real verb instead of the verbized "output" that makes its opening alias to a malformed noun phrase that one would easily misread as an attempt at "Practical digital-to-analog converter outputs neither scaled and delayed sinc functions nor..." until the unexpected noun interrupts that garden path.[15]

Thanks for trying. Dicklyon 05:54, 19 August 2006 (UTC)

Another bug: your link to step function via piecewise constant goes to an article that contradicts what you just said over. It doesn't seem reasonable to me, but that article says a step function can have only a finite number of values, while reconstructing from samples of a bandlimited function via a zero-order hold necessarily creates an infinite number of steps.

(i didn't know of any DACs that we leave turned on forever.) yeah, i didn't like that. i also don't like that title "step function" (i would use it for a single step like a simple scaling and translation of the Heaviside thing), but the math guys have a right to their own jargon. but piecewise constant is the best term for it and it links to that article. r b-j 06:41, 19 August 2006 (UTC)

One more grammar nit, too: when you use zero-order hold as an adjective, as in "zero-order hold filter", you need a hyphen or en dash to make it grammatical. See dash. Dicklyon 06:05, 19 August 2006 (UTC)

since zero-order-hold is a red link, it was easier just to link the title as it is.

But why not fix the logic errors I pointed out as long as you're editing? Dicklyon 06:06, 19 August 2006 (UTC)

perhaps, because, it isn't an error? especially in the context of the explicit baseband qualification. if the sampling rate is sufficient, you can reconstruct. if it isn't you cannot. (is that the converse you keep harping about?)
Yes, that's what I'm talking about. And as I pointed out above, I was not able to imagine any definition of baseband for which your converse would be true, other than the trivial one in which baseband means all signals frequencies are less than fs/2, in which base no baseband signal will ever cause aliasing. In a logical sense, it is always possible to easily find signal spectra and specific signals and reconstruction formulae such that some signals CAN be reconstructed uniquely from their samples even when their bandwidth (measured from 0 if you like) exceeds fs/2. Would you like me to construct the examples for you, since you can't imagine them and persist in invoking this converse that nobody has ever proved? Dicklyon 07:02, 19 August 2006 (UTC)
That non-baseband stuff really should be in a different article. and the language you are insisting on to be inclusive of that non-baseband sampling only confuses.
I agree most of it could be in a separate article about sampling, as I mentioned before. However the part that says that the sampling theorem is easily generalized to other situations such as bandpass spectra makes a lot of sense, since it is generalization of the theorem. I'm not sure what language you are saying that I'm insisting on; please quote me. I'm mostly just insisting on removing the language that claims the converse is true, because it is not. Dicklyon 07:02, 19 August 2006 (UTC)
i'm gonna repeat something from Omegatron speaking to LutzL a while ago:
1. Laymen who don't know anything about anything remotely related to sampling
2. Laymen who know a little about signal processing, but not much; probably audio-related
3. Engineers
4. Mathematicians
...what percentage of each do you think exist in our readership?
The answer to that question has no bearing on whether we should include non-true claims. Nevertheless, please give me your estimates so I don't have to guess. Dicklyon 07:02, 19 August 2006 (UTC)
now the specific issues with you are different than with LutzL, but my goal is the same as Omegatron and Metacomet (may he/she RIP). i want this accessable and accurate. the historical stuff is fine. the historical proof of Shannon is fine, but not as the principal proof which should be like the textbook proofs. even though this isn't the general Sampling (signal processing) article, we need to keep it simple (KISS) and conventional. the non-baseband is not what you see in the sampling chapter of O&S or any of the other texts that i have. the non-baseband stuff is what you might get in a multirate signal processing book (or should i hyphenate the hell outa it: multirate-signal-processing book?), i dunno. but it isn't the conventional sampling and reconstruction theorem as published in texts for readers who having seen it before. that is who we should be targeting this too. r b-j 06:41, 19 August 2006 (UTC)
I don't have any trouble with that set of goals. So why do you ignore my accuracy pleas? Dicklyon 07:02, 19 August 2006 (UTC)

## Baseband???

Rbj, why did you add baseband in a bunch of places? It's a completely unnecessary (and not defined, as far as I can see) restriction on the theorem. If you're trying to add it to make a true converse, it is not sufficient for that. It serves no purpose, mathematical or pedagogical (as currently deployed). It just craps up the article, as you would say. If it's just so you can rename the Undersampling section to be non-baseband, it's really not needed for that; just explain the distinction in that section. And I can't figure out your point in asking me to see WP:POINT; for the dense, maybe you can explain? Dicklyon 04:10, 18 August 2006 (UTC)

ya know, Dick, i'm damned if i do it and damned if i don't. you like to have it both ways. you point out the poorly-titled Undersampling section as "evidence" that this simple statement that is pretty fundamental that i (and O&S) make (that you insist is some kind "logic error" - i think you mean "factual error" but neither is correct, it's not an error at all in the context of the main topic of the article which is about sampling baseband signals, that one section is the exception and really should be in a different article) is wrong because there are cases that B > fs/2 leading you to conclude that that "iff" is not correct. it is correct for baseband sampling, so i qualified it in the sentence you objected to, at the beginning of the article and maybe one other place. you should be happy, i fixed the "error" you were objecting to by adding the qualification explicitly where it was implicit in the whole article except that section which should be in a different article. i even fixed the bad title of the section. maybe, someday, it will get moved to it's own article to keep this one clean and more about one focussed topic. i'm going to bed, i can't compete with PDT since i live in the east. i got work to do tomorrow. r b-j 05:21, 18 August 2006 (UTC)
Yes, you're damned either way, since you pay no attention to the logical notions of theorem, conditions, implication, converse, proof, necessary, sufficient, etc. The term baseband is not necessary to the sampling theorem. And it is not sufficient to make the converse true. So I have to wonder what was the point of adding it. The change of section title is no big deal, but as I mentioned, that change did not requuire the insertion of baseband all over the stuff about the sampling theorem of Shannon. Sleep well. Dicklyon 07:22, 18 August 2006 (UTC)
By the way, to make the converse true with so-called "baseband" sampling, whatever that means, you need some stricter conditions requiring that certain large bands of frequencies have non-zero spectral energy; otherwise, one can construct a counterexample with holes in the spectrum where no information is lost. Or you have to define baseband sampling in terms of the sampling frequency exceeding twice the highest frequency, in which case there can be no baseband sampling with aliasing, so no converse is possible. And the approach of trying to make the spectrum nonzero over enough space that you can prove information must be lost will preclude applying the theorem to signals composed of discrete components. There's really no way you can construct a meaningful converse theorem, even though you keep claiming that the converse is true in your article sentences. That's the kind of logic I'm talking about; mathematical logic; theorems, proofs, etc. Isn't this supposed to be an article about a theorem? If you're going to take stuff out, rather than taking out a potentially useful generalization of the theorem, take out the "practical" stuff to an article on sampling (OK, that's most of the non-baseband section, too, I'll grant you). Dicklyon 07:22, 18 August 2006 (UTC)

## That T factor

Rbj, your factor of T on the sampling impulse is a good idea; you seemed to think that was something that was bothering me, but it never was. In your campaign since 1988 to get it accepted, you might want to rely on the precedent in Charles H. Wilts's book Principles of Feedback Control, 1960 (Addison–Wesley), p.197. He specifically says that the factor of T is usually not used but that it works out better if it is (because the sampler then has unity gain at low frequencies). He computes the Laplace transform of the semi-infinite sampling-impulse train, and derives a bunch of stuff about sampling that's applicable to signals that are zero before time zero. But he doesn't prove the sampling theorem, since there's no such thing as a nontrivial bandlimited signal that's zero before time zero. You might find it interesting. He's pretty careful about the math, but refers to some results (like how to justify using Dirac impulses) without actually doing it out formally. The nice thing about the Laplace transform of the impule train, as opposed to the Fourier transform, is that it's actually finite and analytic. Chuck Wilts was an EE prof., but also an accomplished mountaineer and rock climber (he wrote the guide to Tahquitz Rock and Suicide Rock, where I climbed with him). Dicklyon 06:36, 19 August 2006 (UTC)

it's not the conventional way textbooks do it. this is why i originally left it out (but snuck it in the zero-order hold and First-order hold articles) and first resisted BobK when he pushed to just put it in. but if it wasn't in, there needed to be some explanation (that "Note about scaling" section you didn't like) for when to put it in to get the scaling of the final output right and how to deal with it regarding the ZOH and FOH. i remember back in the 90's some student innocently posted a question on comp.dsp about how to design an analog reconstruction LPF of gain T so that he could recover the original correctly (he was naturally confused because the gain appeared to be different if different units were used for T), and when i said that there was a fundamental problem (that a voltage in, voltage out filter could not possibly have a passband gain of T measured in units of time), we got into a real serious slugfest about it that i mentioned recently to LutzL above (some of the dispute had to do with the dimension of the dirac delta function if the argument to it is of dimension: time). in fact, in that article i did in the JAES, a much more famous AE named Barry Blesser scooped my idea (i didn't know it until a reviewer pointed it out) but Barry's result was off by a factor of T and this was the reason why. the simplest thing is to make sure that the gain of the brickwall filter is the dimensionless 1. then the modeling of the ZOH and FOH is foolproof. going to bed now. r b-j 06:55, 19 August 2006 (UTC)
Well, I think one precedent that says it's a better way is enough to make it part of "conventional" even if most authors ignored that advice. Wilts does say the way "in the literature" is "confusing and misleading" because the values are measured in different units. I agree with him and Bob K that we should avoid that confusion, even if it's conventional. And that note about scaling I will admit bothered me, as it was a big distracting aside that I didn't see the point of. Dicklyon 07:11, 19 August 2006 (UTC)

## Ancient history

Speaking of unconventional campaigns, Robert, I see you've been referring to the "Nyquist/Shannon sampling and reconstruction theorem" since at least 1998; maybe earlier, though in 2002 you say you are "starting to name it that...". Most (or all?) of the books I've looked at just talk about the sampling theorem, and treat the reconstruction formula as an extra add-on, as Shannon did. None treat two half theorems the way you do. Or do they?

they all derive the reconstruction formula. maybe Shannon did not, but O&S, Rabiner and Gold, Bruce Carlson, whoever, they derive the ("Wittaker-Shannon") reconstruction formula as part of the chapter or section on sampling continuous-time signals. as far as textbooks, i don't see a single exception.
Shannon did, too, briefly. But as in the books, it was AFTER proving the theorem, not the second half of a two-half theorem. That's your invention. Having it in the same discussion, chapter, or section does not make it part of the theorem. And if you do want to include it as part of the theorem, it's still just one theorem; it's just a conversion of the final step of the proof into the time domain, not a different thing to be proved.
The reason I would prefer to keep it out of the sampling theorem is that it makes it so much harder to write the more general version of the sampling theorem if you have to construct the interpolation formula in general. The generalized theorem would be something like this: "A continuous-time signal can be reconstructed perfectly from regularly spaced samples taken at rate fs if the spectrum of the signal is not nonzero at frequency fs/2+Nfs for any N and is not nonzero at any pair of frequencies f1 and f2 with f1+f2 = Mfs for any nonzero integer M, and is not nonzero at any pair of frequencies f1 and f2 with f1-f2 = Mfs for any nonzero integer M." I'll have to do some checking to see if I got it exactly right, but it's something like that; I've seen it somewhere that way (one of my books, perhaps). Notice that if all frequencies with nonzero energy are less than fs/2, these conditions are satisfied, which is the usual Shannon baseband case. The more general case can be proved the same way, but the notion of overlap and choice of reconstruction filter need to be generalized and fitted to teh signal spectrum, which obviously has to get known to get a reconstruction. It does NOT have to be known to make the sample sequence unique within the space of signals; that is, to avoid aliasing. Dicklyon 06:44, 20 August 2006 (UTC)

You have references I should check? I also just noticed that large chunks of your writing come directly from the "SRC and sampling theorem FAQ entry" of that era, including the detail I objected to about what the reconstruction filter MUST be. Did you write that FAQ entry?

i don't know what you meant by "SRC and sampling theorem FAQ entry" or where. i did do an entry in an Audio Effects FAQ about the sampling theorem. it's pretty old. i have no contribution to the comp.dsp FAQ, but do have one in the comp.realtime FAQ defining "real time" for DSP. since that flap with this guy named Jay Rabkin in, i dunno, '98 maybe, i have on a few occasions posted updated versions of that sampling theorem FAQ on comp.dsp for reference.
SRC appears to stand for sampling rate conversion. I was quoting something google found me, where you wrote "here is that SRC and sampling theorem FAQ entry": [16]

At least back then it didn't resort to the appeal to "it is clear."

look at context. it is clear that multiplying something by 1 changes it not and multiplying something else by 0 changes it to 0. but i'm happy for improving language to make it even more clear.
Yes, thanks for accepting that one improvement. It's usually good to just say why rather than tease the reader. When it's clear to the writer, the reader needs to study the equation to see why it is so. I did that, from the context as you say, and then just wrote why. Dicklyon 06:44, 20 August 2006 (UTC)

And I see you've been in some interesting arguments about over-claiming what the sampling (and reconstruction) theorem says, as far back as 2001 at least, for example when you said "anything represented by samples must be 'sampled' from a bandlimited function".

should have had the baseband qualification in there. the bandlimited function that those samples represent are either the original bandlimited function if it was properly sampled or it is the aliased version of it, which is a bandlimited function, if it was insufficiently sampled.
I'm still unclear on how you think that adds meaning to the statement. Any old crappy signal, baseband or bandlimited or not, can be represented by samples. What you forgot to say what was meaning you intended by your words. Dicklyon 06:44, 20 August 2006 (UTC)

It's clear enough to me what you meant at the time, but the statement wasn't true and led to a lot of argument that never did settle out. You're going to have to open up to the possibility that some of things you assert are not quite true if you expect to make any progress from where you've been stuck for the last decade. Dicklyon 08:06, 19 August 2006 (UTC)

i do not concede the point to you at all (regarding sampling baseband signals). if you do not sample sufficiently, aliasing occurs because you cannot constrain nor define (other than the bandlimit) anything about the signal being sampled. it is no proof if you get to define the signal to be sampled (except that the bandlimitedness is a "given" or axiomatic) and then "show" that this special case can be reconstructed. the devil hands you the signal and you still have to show that it can be sampled and reconstructed and that can and can only be done for baseband signals with the sampling rate strictly exceeds twice the bandwidth. i'm not changing the "iff" back, but i wasn't changing it for you because you're wrong about this. but it's small enough potatoes, at least at the moment. but i am making it clear that it is your logic that is incorrect because if you say "I can sample and reconstruct baseband signals that don't satisfy the Nyquist criterion." which i don't think you own that statement but i continue to see as equivalent to your position, you don't get to choose the signals. even in the Nyquist critical case fs = 2B. i (the devil) gets to choose the signal and that's where your position fails.
You're mixing up the notion of proof with the notion of a counter example aren't you? If you claim something to be true, and I can pick one condition under which it is not true, then your statement is not true. For that, I get to pick the signal. The converse of the sampling theorem says "if a signal has energy at frequencies at or above half the sampling frequency, then it is not possible to perfectly reconstruct the signal from its samples." This is usually true, if you have no information to help you do a better reconstruction than the typical baseband reconstruction; you make it much more usually true by further restricting it to the case of baseband reconstruction via the W-S reconstruction formula. But that still leaves a tiny crack at the critical frequency where it can be falsified with a particular phase (and any amplitude). Such is logic. Dicklyon 06:44, 20 August 2006 (UTC)

Strangely enough, I notice also that you got into an argument with my near-namesake Rick Lyons back in 2002; something about him not using the words "Nyquist frequency" in his book. I don't know him, but I understand he's a good guy. Dicklyon 08:23, 19 August 2006 (UTC)

actually i was ribbing Rick a bit for (i think his deliberate) not taking a position on the issue of defining "Nyquist frequency" after my discovery that it was not mentioned or defined in his book. i ribbed him a little too hard and he took it personally, but Randy Yates came to my defense and i profusely apologized. i think Rick and i are on good terms although we are clearly not in the same (or neighboring) political camps. in the later revision of Rick's book he made use of some things i put out on comp.dsp (namely first-order noise-shaping with a zero at DC) and i reviewed a few chapters or appendices in advance that resulted in some changes in the book (i reviewed his first book for the JAES). i learned and created, using Remez exchange (will not claim to be the first to), a good way to approximate arctan(x) to any precision with only one division. it's useful for a real-time phase vocoder because the atan(x) function in the supplied run-time libraries is so damn slow (and usually bit-accurate which is not necessary in many cases). r b-j 05:58, 20 August 2006 (UTC)
all good. Good night. Dicklyon 06:44, 20 August 2006 (UTC)

## Counterexample to converse of sampling theorem

And I see that in 2004 you (Robert b-j) provided a constructive counterexample to the converse of the sampling theorem. You showed that perfect reconstruction of a class of non-bandlimited signals is possible (easy even), by defining such a class, namely the class of continuous-time signals that is piecewise linear with slope changes only at multiples of T. Since you yourself provided that example, do you believe it now as a counterexample to the converse of the sampling theorem? You even discussed the corresonding reconstruction method. Dicklyon 08:23, 19 August 2006 (UTC)

the sampling theorem is not about reconstructing with piecewise linear functions, but the first-order hold has something to do with it. be specific about this counterexample you are saying i made. anything i said on comp.dsp or any USENET is public record. r b-j 06:09, 20 August 2006 (UTC)

## Hey Dick, can you remove some (most) of the sinusoids in the Critical frequency illustration?

it's very cluttered. it only needs a couple or 3 or 4 with the intersection points well marked. i uploaded one:

400 px

but the lines are too thin (and i don't know how to get MATLAB or Octave to make them thicker). also, check where you end up uploading. unless it's specific to the English language, i am not sure why you wouldn't upload to the top level at Wikimedia Commons. anyway, i just think that the one you uploaded is very busy making it a bit harder for a reader/viewer to see the point. i did 3, one at θ = 0, π/4, and -π/3 and put little circles at the common points.

give it a consideration. r b-j 06:09, 20 August 2006 (UTC)

OK, I'll work on redoing it. I'm not sure about how to get to a better place to upload, as I just use the "Upload file" link that shows up on the article page. Are there choices there, or do I have to know a better place to do?
iirc, in Matlab you can just make the figure window real small before saving as PNG, and you'll get a small image with relatively thicker lines. Sometimes I re-hack saved figures in Photoshop. I don't recall how I made this one, but I'll see what I can find (probably still have the .m file at least). Dicklyon 06:48, 20 August 2006 (UTC)
OK, I uploaded a new version, but to the usual place. Tell he how to put it in a better place. File:Critical Frequency Aliasing.png (you may need to refresh to see the new version) Dicklyon 07:28, 20 August 2006 (UTC)

## Critical freq change, etc.

Robert, we actually had an edit conflict on the change to critical frequency. We're in sync for once. But I went a little further. Please check and let me know if you agree, for instance, that fN is Nyquist frequency fs/2, not Nyquist rate 2B. Dicklyon 05:27, 20 August 2006 (UTC)

i'm too tired. i'm going to bed. r b-j 06:14, 20 August 2006 (UTC)
one comment: "This upper bound is the Nyquist frequency, denoted fN." you better go everywhere where that BandlimitedB.png image exists and either changed the image or revert it back to

which is the one i uploaded over Metacomet's nice image. fN is shown in that pic as the Nyquist rate not Nyquist frequency. i would just prefer to kill fN as an expression completely and call 2B the Nyquist rate and fs/2 the Nyquist frequency. now i'm outa here. r b-j 06:25, 20 August 2006 (UTC)

just checked. Nothing uses that image, if I believe the "what links here". Please verify. Good catch. Dicklyon 06:55, 20 August 2006 (UTC)

## Mathematical basis tweaks

Robert, I made minor edits to the opening of this section. Please let me know if you have any objections. Trying to make it a little more precise.

And here's an idea on the opening math. Did you consider putting the T inside the arg of the delta, or is that too mind-blowing?

${\displaystyle x_{s}(t)=x(t)\cdot \left(\Delta _{T}(t)\right)\ }$
x(t) is the signal to be sampled.
xs(t) is the sampled signal
ΔT(t) is the sampling operator called the Dirac comb and, being periodic with period T and a mean value of 1, can be formally expressed as a Fourier series:
 ${\displaystyle \Delta _{T}(t)\,}$ ${\displaystyle \equiv \sum _{n=-\infty }^{\infty }\delta ({t-nT \over T})\ }$ ${\displaystyle =\sum _{k=-\infty }^{\infty }e^{i2\pi kf_{s}t}\ }$

Dicklyon 06:05, 20 August 2006 (UTC)

is it consistent with the definition of ${\displaystyle \Delta _{T}(t)\ }$ in Dirac comb? i don't think so. and i don't think we should change it in Dirac comb. someone else will object, we don't know (without checking) who else is using it. r b-j 06:12, 20 August 2006 (UTC)
good point. We'd have to call it the normalized Dirac comb, and that would surely never converge. Dicklyon 06:57, 20 August 2006 (UTC)

### Phasing out "sampling and reconstruction theorem"

I've now made minor adjustments to the wording, not touching the math except for its logical interpretation in text, to conform to standard practice. In particular, after the proof is complete, I say so and restate the theorem, where before its converse was stated even though the converse was not what had just been proved. Further, I removed the name "sampling and reconstruction theorem" that was unique to this section and was Robert b-j's admitted original research; I've put words that he or others should feel free to tweak if they see a better way to phrase it without introducing a new nonstandard theorem and nonstandard name. Dicklyon 16:59, 20 August 2006 (UTC)

## Revised concise summary and non-baseband section

I moved most of the non-baseband (undersampling) section into sampling (signal processing), where it still needs to be cut down and tightened up some. I hope the small section I left is a lot more clear. Dicklyon 01:12, 22 August 2006 (UTC)

I editted the "concise summary" section because I didn't really understand what the two levels of bullets were about. Seems like paragraphs work better. If anyone wants to re-impose some sort of bullet structure, feel free, but I hope it will come out better than what was there. Dicklyon 01:12, 22 August 2006 (UTC)

## "As we have seen" statement

The article says: As we have seen, Borel also used around that time what became known as the cardinal series... This looks like copied and pasted from somewhere. Borel is not referred earlier in the article. The "As we have seen" mention should be removed. —Preceding unsigned comment added by 80.168.247.52 (talkcontribs)

Is it not clear that that's part of an attributed block quote? Dicklyon 20:16, 7 June 2007 (UTC)

## Assessment comment

The comment(s) below were originally left at Talk:Nyquist–Shannon sampling theorem/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

 This topic connects several basic (in the sense of fundamental) results of Fourier theory, and in extension, of harmonic analysis. At the same time it is a fundamental result in signal transmission theory. A more math directed exposition could be assembled at the topic Kluvánec's sampling theorem in harmonic analysis.--LutzL (talk) 08:18, 5 December 2007 (UTC)

Last edited at 08:18, 5 December 2007 (UTC). Substituted at 21:49, 3 May 2016 (UTC)