= Kolmogorov–Arnold representation theorem =

In real analysis and approximation theory, the Kolmogorov–Arnold representation theorem (or superposition theorem) states that every multivariate continuous function $f\colon[0,1]^n\to \R$ can be represented as a superposition of continuous single-variable functions.

The works of Vladimir Arnold and Andrey Kolmogorov established that if f is a multivariate continuous function, then f can be written as a finite composition of continuous functions of a single variable and the binary operation of addition. More specifically,

where $\phi_{q,p}\colon[0,1]\to \R$ and $\Phi_{q}\colon \R \to \R$.

In this representation, the inner functions $\phi_{q,p}$ are continuous and universal, that is, independent of $f$, while the outer functions $\Phi_q$ depend on the specific function being represented. The same representation formula extends to all multivariate functions $f$, including discontinuous ones, as discussed in . If $f$ is continuous, then the corresponding outer functions $\Phi_q$ are continuous; if $f$ is discontinuous, the outer functions are generally discontinuous, while the inner functions $\phi_{q,p}$ remain unchanged, being the same universal functions.

There are proofs with specific constructions.

It solved a more constrained form of Hilbert's thirteenth problem, so the original Hilbert's thirteenth problem is a corollary. In a sense, they showed that the only true continuous multivariate function is the sum, since every other continuous function can be written using univariate continuous functions and summing.

==History==
The Kolmogorov–Arnold representation theorem is closely related to Hilbert's 13th problem. In his Paris lecture at the International Congress of Mathematicians in 1900, David Hilbert formulated 23 problems which in his opinion were important for the further development of mathematics. The 13th of these problems dealt with the solution of general equations of higher degrees. It is known that for algebraic equations of degree 4 the solution can be computed by formulae that only contain radicals and arithmetic operations. For higher orders, Galois theory shows us that the solutions of algebraic equations cannot be expressed in terms of basic algebraic operations. It follows from the so called Tschirnhaus transformation that the general algebraic equation
$x^{n}+a_{n-1}x^{n-1}+\cdots +a_{0}=0$
can be translated to the form $y^{n}+b_{n-4}y^{n-4}+\cdots +b_{1}y+1=0$. The Tschirnhaus transformation is given by a formula containing only radicals and arithmetic operations and transforms. Therefore, the solution of an algebraic equation of degree $n$ can be represented as a superposition of functions of two variables if $n<7$ and as a superposition of functions of $n-4$ variables if $n\geq 7$. For $n=7$ the solution is a superposition of arithmetic operations, radicals, and the solution of the equation

A further simplification with algebraic transformations seems to be impossible which led to Hilbert's conjecture that "A solution of the general equation of degree 7 cannot be represented as a superposition of continuous functions of two variables". This explains the relation of Hilbert's thirteenth problem to the representation of a higher-dimensional function as superposition of lower-dimensional functions. In this context, it has stimulated many studies in the theory of functions and other related problems by different authors.

==Variants==
A variant of Kolmogorov's theorem that reduces the number of
outer functions $\Phi_{q}$ is due to George Lorentz. He showed in 1962 that the outer functions $\Phi_{q}$ can be replaced by a single function $\Phi$. More precisely, Lorentz proved the existence of functions $\phi _{q,p}$, $q=0,1,\ldots, 2n$, $p=1,\ldots,n,$ such that

$f(\mathbf x) = \sum_{q=0}^{2n} \Phi\!\left(\sum_{p=1}^{n} \phi_{q,p}(x_{p})\right).$

David Sprecher replaced the inner functions $\phi_{q,p}$ by one single inner function with an appropriate shift in its argument. He proved that there exist real values $\eta, \lambda_1,\ldots,\lambda_n$, a continuous function $\Phi\colon \mathbb{R} \rightarrow \R$, and a real increasing continuous function $\phi\colon [0,1] \rightarrow [0,1]$ with $\phi \in \operatorname{Lip}(\ln 2/\ln (2N+2))$, for $N \geq n \geq 2$, such that

$f(\mathbf x) = \sum_{q=0}^{2n} \Phi\!\left(\sum_{p=1}^{n} \lambda_p \phi(x_{p}+\eta q)+q \right).$

Phillip A. Ostrand generalized the Kolmogorov superposition theorem to compact metric spaces. For $p=1,\ldots,m$ let $X_p$ be compact metric spaces of finite dimension $n_p$ and let $n = \sum_{p=1}^{m} n_p$. Then there exists continuous functions $\phi_{q,p}\colon X_p \rightarrow [0,1], q=0,\ldots,2n, p=1,\ldots,m$ and continuous functions $G_q\colon [0,1] \rightarrow \R, q=0,\ldots,2n$ such that any continuous function $f\colon X_1 \times \dots \times X_m \rightarrow \mathbb{R}$ is representable in the form

$f(x_1,\ldots,x_m) = \sum_{q=0}^{2n} G_{q}\!\left(\sum_{p=1}^{m} \phi_{q,p}(x_{p})\right).$

The Kolmogorov–Arnold representation theorem and its aforementioned variants also hold for discontinuous multivariate functions.

==Continuous form==

In its classic form the Kolmogorov–Arnold representation has two layers, where the first, called the inner layer, is a vector to vector mapping

$s_q = \sum_{p=1}^{n} \phi_{q,p}(x_{p}), \quad q=0,1,..,2n$

and the second, outer layer, is a vector to scalar mapping

$f(x_1, ... , x_m) = \sum_{q=0}^{2n} \Phi_q\left(s_q\right).$

The transition from discrete to continuous form for the inner layer gives the equation of Urysohn with 3D kernel

$s(q) = \int_{p_1}^{p_2} F[x(p),p,q]dp, \quad q \in [q_1, q_2],$

the same transition for the outer layer gives its particular case

$f = \int_{q_1}^{q_2} G[s(q),q]dq.$

The generalization of the Kolmogorov-Arnold representation known as the Kolmogorov-Arnold network in continuous form
is a chain of Urysohn equations, where the outer equation may also return a function or a vector as multiple related targets.

The Urysohn equation was introduced in 1924 for a different purpose, as function to function mapping with the problem
of finding a function $x(p)$, provided $s(q)$ and $F[x(p),p,q]$.

==Limitations==

The theorem does not hold in general for complex multi-variate functions, as discussed here. Furthermore, the non-smoothness of the inner functions and their "wild behavior" has limited the practical use of the representation, although there is some debate on this.

== Applications ==
In the field of machine learning, there have been various attempts to use neural networks modeled on the Kolmogorov–Arnold representation. In these works, the Kolmogorov–Arnold theorem plays a role analogous to that of the universal approximation theorem in the study of multilayer perceptrons.

== Proof ==

Here one example is proved. A proof for the case of functions depending on two variables is given, as the generalization is immediate.

=== Setup ===

- Let $I$ be the unit interval $[0, 1]$.
- Let $C[I]$ be the set of continuous functions of type $[0, 1] \to \R$. It is a function space with supremum norm (it is a Banach space).
- Let $f$ be a continuous function of type $[0, 1]^2 \to \R$, and let $\|f\|$ be the supremum of it on $[0, 1]^2$.
- Let $t$ be a positive irrational number. Its exact value is irrelevant.

We say that a 5-tuple $(\phi_1, \dots, \phi_5) \in C[I]^5$ is a Kolmogorov–Arnold tuple if and only if for any $f \in C[I^2]$ there exists a continuous function $g: \R \to \R$, such that $f(x,y) = \sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y))$In the notation, we have the following:

=== Proof ===

Fix a $f \in C[I^2]$. We show that a certain subset $U_f \subset C[I]^5$ is open and dense: There exists continuous $g$ such that $\|g\| < \frac{1.01}{7} \|f\|$, and $\Big\| f(x,y) - \sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y)) \Big\| < \frac{6.01}{7} \|f\|$We can assume that $\|f\| = 1$ with no loss of generality.

By continuity, the set of such 5-tuples is open in $C[I]^5$. It remains to prove that they are dense.

The key idea is to divide $[0, 1]^2$ into an overlapping system of small squares, each with a unique address, and define $g$ to have the appropriate value at each address.

==== Grid system ====
Let $\psi_1 \in C[I]$. For any $\epsilon > 0$, for all large $N$, we can discretize $\psi_1$ into a continuous function $\phi_1$ satisfying the following properties:

- $\phi_1$ is constant on each of the intervals $[0/5N, 4/5N], [5/5N, 9/5N], \dots, [1-5/5N, 1-1/5N]$.

- These values are different rational numbers.
- $\|\psi_1 - \phi_1\| < \epsilon$.

This function $\phi_1$ creates a grid address system on $[0, 1]^2$, divided into streets and blocks. The blocks are of form $[0/5N, 4/5N] \times [0/5N, 4/5N], [0/5N, 4/5N] \times [5/5N, 9/5N], \dots$.

Since $f$ is continuous on $[0, 1]^2$, it is uniformly continuous. Thus, we can take $N$ large enough, so that $f$ varies by less than $1/7$ on any block.

On each block, $\phi_1(x) + t \phi_1(y)$ has a constant value. The key property is that, because $t$ is irrational, and $\phi_1$ is rational on the blocks, each block has a different value of $\phi_1(x) + t \phi_1(y)$.

So, given any 5-tuple $(\psi_1, \dots, \psi_5)$, we construct such a 5-tuple $(\phi_1, \dots, \phi_5)$. These create 5 overlapping grid systems.

Enumerate the blocks as $R_{i, r}$, where $R_{i,r}$ is the $r$ -th block of the grid system created by $\phi_i$. The address of this block is $a_{i, r} := \phi_i(x) + t \phi_i(y)$, for any $(x, y) \in R_{i, r}$. By adding a small and linearly independent irrational number (the construction is similar to that of the Hamel basis) to each of $(\phi_1, \dots, \phi_5)$, we can ensure that every block has a unique address.

By plotting out the entire grid system, one can see that every point in $[0, 1]^2$ is contained in 3 to 5 blocks, and 2 to 0 streets.

==== Construction of g ====
For each block $R_{i, r}$, if $f > 0$ on all of $R_{i, r}$ then define $g(a_{i, r}) = +1/7$; if $f < 0$ on all of $R_{i, r}$ then define $g(a_{i, r}) = -1/7$. Now, linearly interpolate $g$ between these defined values. It remains to show this construction has the desired properties.

For any $(x, y) \in I^2$, we consider three cases.

If $f(x, y) \in [1/7, 7/7]$, then by uniform continuity, $f > 0$ on every block $R_{i, r}$ that contains the point $(x, y)$. This means that $g = 1/7$ on 3 to 5 of the blocks, and have an unknown value on 2 to 0 of the streets. Thus, we have $\sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y)) \in [1/7, 5/7]$ giving$\Big| f(x,y) - \sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y)) \Big| \in [0, 6/7]$Similarly for $f(x, y) \in [-7/7, -1/7]$.

If $f(x, y) \in [-1/7, 1/7]$, then since $\|g\| \leq 1/7$, we still have $\Big| f(x,y) - \sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y)) \Big| \in [0, 6/7]$

==== Baire category theorem ====
Iterating the above construction, then applying the Baire category theorem, we find that the following kind of 5-tuples are open and dense in $C[I]^5$: There exists a sequence of $g_1, g_2, \dots$ such that $\|g_1\| < \frac{1.01}{7} \|f\|$, $\|g_2\| < \frac{1.01}7 \frac{6.01}7 \|f\|$, etc. This allows their sum to be defined: $g := \sum_n g_n$, which is still continuous and bounded, and it satisfies $f(x,y) = \sum_{i=1}^5 g(\phi_i(x) + t \phi_i(y))$Since $C[I^2]$ has a countable dense subset, we can apply the Baire category theorem again to obtain the full theorem.

=== Extensions ===
The above proof generalizes for $n$-dimensions: Divide the cube $[0, 1]^n$ into $(2n+1)$ interlocking grid systems, such that each point in the cube is on $(n+1)$ to $(2n + 1)$ blocks, and $0$ to $n$ streets. Now, since $(n+1) > n$, the above construction works.

Indeed, this is the best possible value.

A relatively short proof is given in via dimension theory.

In another direction of generality, more conditions can be imposed on the Kolmogorov–Arnold tuples.

The proof is given in.

(Vituškin, 1954) showed that the theorem is false if we require all functions $f, g,\phi_i$ to be continuously differentiable. The theorem remains true if we require all $\phi_i$ to be 1-Lipschitz continuous.

==See also==
- Kolmogorov-Arnold Networks

==Sources==
- Andrey Kolmogorov, "On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables", Proceedings of the USSR Academy of Sciences, 108 (1956), pp. 179–182; English translation: Amer. Math. Soc. Transl., "17: Twelve Papers on Algebra and Real Functions" (1961), pp. 369–373.
- Vladimir Arnold, "On functions of three variables", Proceedings of the USSR Academy of Sciences, 114 (1957), pp. 679–681; English translation: Amer. Math. Soc. Transl., "28: Sixteen Papers on Analysis" (1963), pp. 51–54. SpringerLink
- Vladimir Arnold, "On the representation of continuous functions of three variables as superpositions of continuous functions of two variables", Dokl. Akad. Nauk. SSSR 114:4 (1957), pp. 679–681 (in Russian) SpringerLink
- Andrey Kolmogorov, "On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition", (1957); English translation: Amer. Math. Soc. Transl., "28: Sixteen Papers on Analysis" (1963), PDF
- Vladimir Arnold, On The Representation of Continuous Functions of 3 Variables By The Superpositions of Continuous Functions of 2 Variables (1961), PhD Thesis
