# Dilution (neural networks)

(Redirected from Dropout (neural networks))

Stochastic Delta Rule[1] (also called Dropout or DropConnect[2]) is a regularization technique for reducing overfitting in artificial neural networks by preventing complex co-adaptations on training data. It is an efficient way of performing model averaging with neural networks.[3] The term dilution refers to the thinning of the weights.[4] The term dropout refers to randomly "dropping out", or omitting, units (both hidden and visible) during the training process of a neural network.[5][6][3] Both the thinning of weights and dropping out units trigger the same type of regularization, and often the term dropout is used when referring to the removal of weights.

## Types and uses

Generally dropout or SDR are used for adding damping noise to the connections or nodes.

These techniques are also sometimes referred to as random pruning of weights, but this is usually a non-recurring one-way operation. Although in SDR the weights are represented as probability distribution with mean and standard deviation. Both parameters are modified through gradient descent with the standard deviation collapsing to zero through simulated annealing. At some point the network, therefore converges to one network with only mean values and all variances converging to zero. Dropout is a special case of this type of search and regularization. Output from a layer of linear nodes, in an artificial neural net can be described as

${\displaystyle y_{i}=\sum _{j}w_{ij}x_{j}}$

(1)

• ${\displaystyle y_{i}}$ – output from node ${\displaystyle i}$
• ${\displaystyle w_{ij}}$ – real weight before dilution, also called the Hebb connection strength
• ${\displaystyle x_{j}}$ – input from node ${\displaystyle j}$

This can be written in vector notation as

${\displaystyle \mathbf {y} =\mathbf {W} \mathbf {x} }$

(2)

• ${\displaystyle \mathbf {y} }$ – output vector
• ${\displaystyle \mathbf {W} }$ – weight matrix
• ${\displaystyle \mathbf {x} }$ – input vector

Equations (1) and (2) are used in the subsequent sections.

## Stochastic Delta Rule

During SDR weights are created as probability distributions with a mean and standard deviation.,

During learning both the mean and standard deviation are updated with partial of error with respect to each parameter. This algorithm, thus introduces weight noise that is adaptive and eventually converges to a single network with mean values as S.D. decay towards zero over learning. This adaptively removes weight connections over learning depending on prediction error as well as injecting adaptive noise into the network.

## Dropout

Dropout is a special case of the previous weight equation (3), where the aforementioned equation is adjusted to remove a whole row in the vector matrix, and not only random weight.

See https://direct.mit.edu/neco/article/32/5/1018/95589/The-Stochastic-Delta-Rule-Faster-and-More-Accurate which shows the proof for how Dropout is a special case of SD

There are publications prior to the Hinton including Hanson (1990), the stochastic delta rule, which predates the technique introduced with the name dropout by Geoffrey Hinton, et al. in 2012.[3] Google currently holds the patent for the dropout technique.[7][note 1] Rutgers is presently disputing the Google patent.

## Notes

1. ^ The patent is most likely not valid due to previous art (Hanson, 1990). This predates Hinton's paper. Rutgers University has filed a patent violation against Google.

## References

1. ^ Hanson, Stephen (1990). "A stochastic version of the delta rule". Physica D: Nonlinear Phenomena: 265–272.
2. ^ Wan, Li; Zeiler, Matthew; Zhang, Sixin; Le Cun, Yann; Fergus, Rob (2013). "Regularization of Neural Networks using DropConnect". Proceedings of the 30th International Conference on Machine Learning, PMLR. 28 (3): 1058–1066 – via PMLR.
3. ^ a b c Hinton, Geoffrey E.; Srivastava, Nitish; Krizhevsky, Alex; Sutskever, Ilya; Salakhutdinov, Ruslan R. (2012). "Improving neural networks by preventing co-adaptation of feature detectors". arXiv:1207.0580 [cs.NE].
4. ^ Hertz, John; Krogh, Anders; Palmer, Richard (1991). Introduction to the Theory of Neural Computation. Redwood City, California: Addison-Wesley Pub. Co. pp. 45–46. ISBN 0-201-51560-1.
5. ^ "Dropout: A Simple Way to Prevent Neural Networks from Overfitting". Jmlr.org. Retrieved July 26, 2015.
6. ^ Warde-Farley, David; Goodfellow, Ian J.; Courville, Aaron; Bengio, Yoshua (2013-12-20). "An empirical analysis of dropout in piecewise linear networks". arXiv:1312.6197 [stat.ML].
7. ^ US 9406017B2, Hinton, Geoffrey E., "System and method for addressing overfitting in a neural network", published 2016-08-02, issued 2016-08-02