Data binning
This article needs attention from an expert in Statistics. Please add a reason or a talk parameter to this template to explain the issue with the article.(November 2008) |
Data binning (also called Discrete binning or bucketing) is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall into a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization.
Statistical data binning is a way to group numbers of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals (for example, grouping every five years together). It can also be used in multivariate statistics, binning in several dimensions at once.
Image data processing
In the context of image processing, binning is the procedure of combining a cluster of pixels into a single pixel. As such, in 2x2 binning, an array of 4 pixels becomes a single larger pixel,[1] reducing the overall number of pixels.
This aggregation, although associated with loss of information, reduces the amount of data to be processed, facilitating the analysis. For instance, binning the data may also reduce the impact of read noise on the processed image (at the cost of a lower resolution).
Example usage
Histograms are an example of data binning used in order to observe underlying distributions. They typically occur in one-dimensional space and in equal intervals for ease of visualization.
Data binning may be used when small instrumental shifts in the spectral dimension from mass spectrometry (MS) or nuclear magnetic resonance (NMR) experiments will be falsely interpreted as representing different components, when a collection of data profiles is subjected to pattern recognition analysis. A straightforward way to cope with this problem is by using binning techniques in which the spectrum is reduced in resolution to a sufficient degree to ensure that a given peak remains in its bin despite small spectral shifts between analyses. For example, in NMR the chemical shift axis may be discretized and coarsely binned, and in MS the spectral accuracies may be rounded to integer atomic mass unit values. Also, several digital camera systems incorporate an automatic pixel binning function to improve image contrast.[2]
Binning is also used in machine learning to speed up[3] the decision-tree boosting method for supervised classification and regression in algorithms such as Microsoft's LightGBM and scikit-learn's Histogram-based Gradient Boosting Classification Tree.
See also
- Histogram
- Grouped data
- Level of measurement
- Quantization (signal processing)
- Discretization of continuous features
References
- ^ "Small explanation of binning in image processing". Steve Cannistra. Retrieved 2011-01-18.
- ^ "Use of binning in photography". Nikon, FSU. Retrieved 2011-01-18.
- ^ " "LightGBM: a highly-efficient gradient boosting decision tree". Neural Information Processing Systems (NIPS). Retrieved 2019-12-18.