Universal approximation theorem
In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.
Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case.
Although feed-forward networks with a single hidden layer are universal approximators, the width of such networks has to be exponentially large. In 2017 Lu et al.  proved universal approximation theorem for width-bounded deep neural networks. In particular, they showed that width-n+4 networks with ReLU activation functions can approximate any Lebesgue integrable function on n-dimensional input space with respect to distance if the depth of the network is allowed to grow. They also showed the limited expressive power if the width is less than or equal to n. All Lebesgue integrable functions except for a zero measure set cannot be approximated by width-n ReLU networks.
Let be a nonconstant, bounded, and continuous function. Let denote the m-dimensional unit hypercube . The space of real-valued continuous functions on is denoted by . Then, given any and any function , there exist an integer , real constants and real vectors for , such that we may define:
as an approximate realization of the function ; that is,
for all . In other words, functions of the form are dense in .
This still holds when replacing with any compact subset of .
The universal approximation theorem for width-bounded networks  in mathematical terms:
The theorem of limited expressive power for width- networks  in mathematical terms:
For any Lebesgue-integrable function satisfying that is a positive measure set in Lebesgue measure, and any function represented by a fully-connected ReLU network with width , the following equation holds:
- Balázs Csanád Csáji (2001) Approximation with Artificial Neural Networks; Faculty of Sciences; Eötvös Loránd University, Hungary
- Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274
- Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257. doi:10.1016/0893-6080(91)90009-T
- Lu, Z., Pu, H., Wang, F., Hu, Z., & Wang, L. (2017). The Expressive Power of Neural Networks: A View from the Width. Neural Information Processing Systems, 6231-6239.
- Hanin, B. (2017). Universal function approximation by deep neural nets with bounded width and ReLU activations. arXiv preprint arXiv:1708.02691.
- Haykin, Simon (1998). Neural Networks: A Comprehensive Foundation, Volume 2, Prentice Hall. ISBN 0-13-273350-1.
- Hassoun, M. (1995) Fundamentals of Artificial Neural Networks MIT Press, p. 48
|This applied mathematics-related article is a stub. You can help Wikipedia by expanding it.|