Data reduction

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts.

When information is derived from instrument readings there may also be a transformation from analog to digital form. When the data are already in digital form the 'reduction' of the data typically involves some editing, scaling, encoding, sorting, collating, and producing tabular summaries. When the observations are discrete but the underlying phenomenon is continuous then smoothing and interpolation are often needed. The data reduction is often undertaken in the presence of reading or measurement errors. Some idea of the nature of these errors is needed before the most likely value may be determined.

An example in astronomy is the data reduction in the Kepler satellite. This satellite records 95-megapixel images once every six seconds, generating dozens of megabytes of data per second, which is orders-of-magnitudes more than the downlink bandwidth of 550 KBps. The on-board data reduction encompasses co-adding the raw frames for thirty minutes, reducing the bandwidth by a factor of 300. Furthermore, interesting targets are pre-selected and only the relevant pixels are processed, which is 6% of the total. This reduced data is then sent to Earth where it is processed further.

Research has also been carried out on the use of data reduction in wearable (wireless) devices for health monitoring and diagnosis applications. For example, in the context of epilepsy diagnosis, data reduction has been used to increase the battery lifetime of a wearable EEG device by selecting, and only transmitting, EEG data that is relevant for diagnosis and discarding background activity.[1]

Best practices[edit]

These are common techniques used in data reduction.

  • Order by some aspect of size.
  • Table diagonalization, whereby rows and columns of tables are re-arranged to make patterns easier to see (refer to the diagram).
  • Round drastically to one, or at most two, effective digits (effective digits are ones that vary in that part of the data).
  • Use averages to provide a visual focus as well as a summary.
  • Use layout and labeling to guide the eye.
  • Remove chartjunk, such as pictures and lines.
  • Give a brief verbal summary.[2]

See also[edit]


  1. ^ Iranmanesh, S.; Rodriguez-Villegas, E. (2017). "A 950 nW Analog-Based Data Reduction Chip for Wearable EEG Systems in Epilepsy". IEEE Journal of Solid-State Circuits. 52 (9): 2362–2373. doi:10.1109/JSSC.2017.2720636. hdl:10044/1/48764.
  2. ^ Data, but No Information: Presentation really is everything — or close to it. By Andrew Ehrenberg


  • Ehrenberg, Andrew S. C. (1975,1981), Data Reduction, John Wiley, Chichester. Reprinted in the Journal of Empirical Generalisations in Marketing Science, 2000, 5, 1-391
  • Ehrenberg, Andrew S. C. (1982) A Primer in Data Reduction: An Introductory Statistics Ehrenberg