Jump to content

Rectifier (neural networks)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Jasper Deng (talk | contribs) at 06:10, 30 April 2016 (Potential problems: rewrite in slightly stricter terms, but this needs elaboration either way). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Plot of the rectifier (blue) and softplus (green) functions near x = 0.

In the context of artificial neural networks, the rectifier is an activation function defined as

where x is the input to a neuron. This is also known as a ramp function, and it is analogous to half-wave rectification in electrical engineering. This activation function has been argued to be more biologically plausible[1] than the widely used logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical[2] counterpart, the hyperbolic tangent. The rectifier is, as of 2015, the most popular activation function for deep neural networks.[3]

A unit employing the rectifier is also called a rectified linear unit (ReLU).[4]

A smooth approximation to the rectifier is the analytic function

which is called the softplus function.[5] The derivative of softplus is , i.e. the logistic function.

Rectified linear units find applications in computer vision,[1] and speech recognition[6] [7] using deep neural nets.

Variants

Noisy ReLUs

Rectified linear units can be extended to include Gaussian noise, making them noisy ReLUs, giving[4]

, with

Noisy ReLUs have been used with some success in restricted Boltzmann machines for computer vision tasks.[4]

Leaky ReLUs

Leaky ReLUs allow a small, non-zero gradient when the unit is not active.[7]

Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters.[8]

Note that for , this is equivalent to

and thus has a relation to "maxout" networks.[8]

Advantages

  • Biological plausibility: One-sided, compared to the antisymmetry of tanh.
  • Sparse activation: For example, in a randomly initialized network, only about 50% of hidden units are activated (having a non-zero output).
  • Efficient gradient propagation: No vanishing gradient problem or exploding effect.
  • Efficient computation: Only comparison, addition and multiplication.

For the first time in 2011,[1] the use of the rectifier as a non-linearity has been shown to enable training deep supervised neural networks without requiring unsupervised pre-training. Rectified linear units, compared to sigmoid function or similar activation functions, allow for faster and effective training of deep neural architectures on large and complex datasets.

Potential problems

  • Non-differentiable at zero: however it is differentiable anywhere else, including points arbitrarily close to (but not equal to) zero.

See also

References

  1. ^ a b c Deep sparse rectifier neural networks (PDF). AISTATS. 2011. {{cite conference}}: Unknown parameter |authors= ignored (help)
  2. ^ "Efficient BackProp" (PDF). Neural Networks: Tricks of the Trade. Springer. 1998. {{cite encyclopedia}}: Unknown parameter |authors= ignored (help); Unknown parameter |editors= ignored (|editor= suggested) (help)
  3. ^ LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). "Deep learning". Nature. 521: 436–444. doi:10.1038/nature14539.
  4. ^ a b c Rectified linear units improve restricted Boltzmann machines (PDF). ICML. 2010. {{cite conference}}: Unknown parameter |authors= ignored (help)
  5. ^ C. Dugas, Y. Bengio, F. Bélisle, C. Nadeau, R. Garcia, NIPS'2000, (2001),Incorporating Second-Order Functional Knowledge for Better Option Pricing
  6. ^ Phone Recognition with Deep Sparse Rectifier Neural Networks (PDF). ICASSP. 2013. {{cite conference}}: Unknown parameter |authors= ignored (help)
  7. ^ a b Andrew L. Maas, Awni Y. Hannun, Andrew Y. Ng (2014). Rectifier Nonlinearities Improve Neural Network Acoustic Models
  8. ^ a b Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun (2015) Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification