Softmax activation function

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The softmax activation function is a neural transfer function. In neural networks, transfer functions calculate a layer's output from its net input. It is a biologically plausible approximation to the maximum operation [1]. It is used to simulate an invariance operation of complex cells [2] where it is defined as


y=g \left( 
\frac{\sum_{j=1}^n x_j^{q+1}}
{k+\left( \sum_{j=1}^n x_j^q \right)}
\right) \text{,}

where g is a sigmoid function.

In neural network simulations, the term softmax activation function refers to a similar function defined by[3]


p_i = \frac{\exp(q_i)}{\sum_{j=1}^n\exp(q_j)} \text{,}

where p is the value of an output node, q is the net input to an output node, and n is the number of output nodes. It ensures all of the output values p are between 0 and 1, and that their sum is 1. This is a generalization of the logistic function to multiple variables.

See Multinomial logit for a probability model which uses the softmax activation function.

[edit] Reinforcement learning

In the field of reinforcement learning, a softmax function can be used to convert values into action probabilities. The function commonly used is[4]:


P_t(a) = \frac{\exp(q_t(a)/\tau)}{\sum_{i=1}^n\exp(q_t(i)/\tau)} \text{,}

where the action value qt(a) corresponds to the expected reward of following action a and τ is called a temperature parameter (in allusion to chemical kinetics). For high temperatures (\tau\to \infty), all actions have nearly the same probability and the lower the temperature, the more expected rewards affect the probability. For a low temperature (\tau\to 0^+), the probability of the action with the highest expected reward tends to 1.

[edit] Smooth approximation of maximum

When parameterized by some constant, α > 0, the following formulation becomes a smooth, differentiable approximation of the maximum function:


\mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right) = \frac{\sum_{i=1}^{n}x_i e^{\alpha x_i}}{\sum_{i=1}^{n}e^{\alpha x_i}}

\mathcal{S}_{\alpha} has the following properties:

  1. \mathcal{S}_{\alpha}\to \max as \alpha\to\infty
  2. \mathcal{S}_{0} is the average of its inputs
  3. \mathcal{S}_{\alpha}\to \min as \alpha\to -\infty

The gradient of softmax is given by:


\nabla_{x_i}\mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right) = \frac{e^{\alpha x_i}}{\sum_{i=1}^{n}e^{\alpha x_i}}\left[1 + \alpha\left(x_i - \mathcal{S}_{\alpha}\left(\left\{x_i\right\}_{i=1}^{n}\right)\right)\right] \text{,}

which makes the softmax function useful for optimization techniques that use gradient descent.

[edit] References

  1. ^ Cadieu C, Kouh M, Pasupathy A, Conner CE, Riesenhuber M, and Poggio T. A Model of V4 Shape Selectivity and Invariance. J Neurophysiol 98: 1733-1750, 2007.
  2. ^ Serre T, Kouh M, Cadieu C, Knoblich U, Kreiman G, and Poggio T. A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex. CBCL Paper 259/AI Memo 2005-036. Cambridge, MA: MIT, 2005.
  3. ^ ai-faq What is a softmax activation function?
  4. ^ Sutton, R. S. and Barto A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, MA, 1998.Softmax Action Selection


Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export