This article needs additional citations for verification. (August 2015) (Learn how and when to remove this template message)
The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms. It is defined as the logarithm of the sum of the exponentials of the arguments:
The first inequality is strict unless . The second inequality becomes an equality exactly when all the arguments are equal. Proof: Let . Then . Applying the logarithm to the inequality gives the result.
In addition, we can scale the function to make the bounds tighter. Consider the function . Then
Proof: Replace each with for some in the inequalities above, to give
finally, dividing by gives the result.
Also, if we multiply by a negative number instead, we of course find a comparison to the function:
Writing the partial derivatives are:
log-sum-exp trick for log-domain calculations
Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale.
A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.
Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient). Therefore, many math libraries such as IT++ provide a default routine of LSE and use this formula internally.
A strictly convex log-sum-exp type function
LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function by adding an extra argument set to zero:
This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.
- Zhang, Aston; Lipton, Zack; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.
- Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18: 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442.
- El Ghaoui, Laurent (2017). Optimization Models and Applications.
- "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.
- Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. Cite journal requires