Probabilistic numerics: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 209: Line 209:
Probabilistic numerical methods for [[Ordinary differential equation|ordinary differential equations]] <math> \dot{f}(t) = w(t,f(t)) </math>, have been developed for initial and boundary value problems. Many different probabilistic numerical methods designed for ordinary differential equations have been proposed. These can broadly be grouped into the two following categories:
Probabilistic numerical methods for [[Ordinary differential equation|ordinary differential equations]] <math> \dot{f}(t) = w(t,f(t)) </math>, have been developed for initial and boundary value problems. Many different probabilistic numerical methods designed for ordinary differential equations have been proposed. These can broadly be grouped into the two following categories:


* Randomisation-based methods: These algorithms are based on perturbing standard deterministic numerical methods in a random fashion. For example, this has been achieved by adding Gaussian perturbations on the solution of one-step integrators or by perturbing randomly their time-step<ref name=":Abdulle2020">{{cite journal
* Randomisation-based methods: These algorithms are based on perturbing standard deterministic numerical methods in a random fashion. For example, this has been achieved by adding Gaussian perturbations on the solution of one-step integrators<ref name=":Conrad2016">{{cite journal
| author= Abdulle, A., Garegnani, G.
| author= Conrad, P.R.; Girolami, M.; Särkkä, S.; Stuart, A.M.; Zygalakis, K.
| title = Statistical analysis of differential equations: introducing probability measures on numerical solutions
| journal = Stat. Comput.
| volume = 27
| pages = 1065&ndash;1082
| year = 2017
| doi = 10.1007/s11222-016-9671-0}}</ref> or by perturbing randomly their time-step<ref name=":Abdulle2020">{{cite journal
| author= Abdulle, A.; Garegnani, G.
| title = Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration
| title = Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration
| journal = Stat. Comput.
| journal = Stat. Comput.

Revision as of 12:10, 27 October 2021

Probabilistic numerics is a scientific field at the intersection of statistics, machine learning and applied mathematics, where tasks in numerical analysis including finding numerical solutions for integration, linear algebra, optimisation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.[1][2][3][4]

Introduction

A numerical method is an algorithm that approximates the solution to a mathematical problem (examples below include the solution to a linear system of equations, the value of an integral, the solution of a differential equation, the minimum of a multivariate function). In a probabilistic numerical algorithm, this process of approximation is thought of as a problem of estimation, inference or learning and realised in the framework of probabilistic inference.

Formally, this means casting the setup of the computational problem in terms of a prior distribution, formulating the relationship between numbers computed by the computer (e.g. matrix-vector multiplications in linear algebra, gradients in optimization, values of the integrand or the vector field defining a differential equation) and the quantity in question (the solution of the linear problem, the minimum, the integral, the solution curve) in a likelihood function, and returning a posterior distribution as the output. In most cases, numerical algorithms also take internal adaptive decisions about which numbers to compute, which form an active learning problem.

Many of the most popular classic numerical algorithms can be re-interpreted in the probabilistic framework. This includes the method of conjugate gradients,[5][6][7] Nordsieck methods, Gaussian quadrature rules,[8] and Quasi-Newton methods[9]. In all these cases, the classic method is based on a regularized least-squares estimate that can be associated with the posterior mean arising from a Gaussian prior and likelihood. In such cases, the variance of the Gaussian posterior is then associated with a worst-case estimate for the squared error.

Probabilistic numerical methods promise several conceptual advantages over classic, point-estimate based approximation techniques:

  • They return structured error estimates (in particular, the ability to return joint posterior samples, i.e. multiple realistic hypotheses for the true unknown solution of the problem)
  • Hierarchical Bayesian inference can be used to set and control internal hyperparameters in such methods in a generic fashion, rather than having to re-invent novel methods for each parameter
  • Since they use and allow for an explicit likelihood describing the relationship between computed numbers and target quantity, probabilistic numerical methods can use the results of even highly imprecise, biased and stochastic computations.[10] Conversely, probabilistic numerical methods can also provide a likelihood in computations often considered "likelihood-free" elsewhere[11]
  • Because all probabilistic numerical methods use essentially the same data type – probability measures – to quantify uncertainty over both inputs and outputs they can be chained together to propagate uncertainty across large-scale, composite computations
  • Sources from multiple sources of information (e.g. algebraic, mechanistic knowledge about the form of a differential equation, and observations of the trajectory of the system collected in the physical world) can be combined naturally and inside the inner loop of the algorithm, removing otherwise necessary nested loops in computation, e.g. in inverse problems.[12]

These advantages are essentially the equivalent of similar functional advantages that Bayesian methods enjoy over point-estimates in machine learning, applied or transferred to the computational domain.

Numerical Tasks

Integration

Bayesian quadrature with a Gaussian process conditioned on evaluations of the integrand (shown in black). Shaded areas in the left column illustrate the marginal variance. The right figure shows the prior () and posterior () Gaussian distribution over the value of the integral, as well as the true solution.

Probabilistic numerical methods have been developed for the problem of numerical integration, with the most popular method called Bayesian quadrature[13] [14] [15] [16].

In numerical integration, function evaluations at a number of points are used to estimate the integral of a function against some measure . Bayesian quadrature consists of specifying a prior distribution over and conditioning this prior on to obtain a posterior distribution over , then computing the implied posterior distribution on . The most common choice of prior is a Gaussian process as this allows us to obtain a closed-form posterior distribution on the integral which is a univariate Gaussian distribution. Bayesian quadrature is particularly useful when the function is expensive to evaluate and the dimension of the data is small to moderate.

Optimization

Probabilistic approaches have also been studied for mathematical optimization. Perhaps the most notable effort in this direction is Bayesian optimization, a general approach to optimization grounded in Bayesian inference. Bayesian optimization algorithms operate by maintaining a probabilistic belief about an objective function throughout optimization; this often takes the form of a Gaussian process prior conditioned on observations obtained during optimization. This belief then guides the algorithm in obtaining observations that are likely to advance the optimization process. One prominent approach is to model optimization via Bayesian sequential experimental design, seeking to obtain a sequence of observations yielding the most optimization progress as evaluated by an appropriate utility function.

Linear Algebra

Probabilistic numerical methods for linear algebra have primarily focused on the solution of linear systems of the form .

[5] [6] [17] [7] ,[18][19]

Ordinary Differential Equations

Probabilistic numerical methods for ordinary differential equations , have been developed for initial and boundary value problems. Many different probabilistic numerical methods designed for ordinary differential equations have been proposed. These can broadly be grouped into the two following categories:

  • Randomisation-based methods: These algorithms are based on perturbing standard deterministic numerical methods in a random fashion. For example, this has been achieved by adding Gaussian perturbations on the solution of one-step integrators[20] or by perturbing randomly their time-step[21]. This defines a probability model for the solution of the differential equation.
  • Gaussian process regression. These algorithms are based on posing the problem as a Gaussian process regression problem, in a similar fashion to Bayesian cubature - But with different and often non-linear observation models. In its infancy, this was based on naive Gaussian process regression. This was later abandoned in favour of Gaussian Markov models, where is a vector containing the, say, first derivatives of . Inference can thus be implemented efficiently with Kalman filtering based methods.

The boundary between these two categories is not sharp, indeed a Gaussian process regression approach based on randomised data was developed as well. These methods have been applied to problems in computational Riemannian geometry, inverse problems, and latent force models.

Partial Differential Equations

,[3][22][23][4]

History and Related Fields

The interplay between numerical analysis and probability is touched upon by a number of other areas of mathematics, including average-case analysis of numerical methods, information-based complexity, game theory, and statistical decision theory. Precursors to what is now being called “probabilistic numerics” can be found as early as the late 19th and early 20th century.

The origins of probabilistic numerics can be traced to a discussion of probabilistic approaches to polynomial interpolation by Henri Poincaré in his Calcul des Probabilités:[24] in modern terminology, Poincaré considered a Gaussian prior distribution on a function , expressed as a formal power series with random coefficients, and asked for “probable values” of given this prior and observations for .

A later seminal contribution to the interplay of numerical analysis and probability was provided by Al′bert Sul′din in the context of univariate quadrature.[25] The statistical problem considered by Sul′din was the approximation of the definite integral of a function , under a Brownian motion prior on , given access to pointwise evaluation of at nodes . Sul′din showed that, for given quadrature nodes, the quadrature rule with minimal mean squared error is the trapezoidal rule; furthermore, this minimal error is proportional to the sum of cubes of the inter-node spacings. As a result, one can see the trapezoidal rule with equally-spaced nodes as statistically optimal in some sense — an early example of the average-case analysis of a numerical method.

As discussed in [4] and [26] interplays between numerical approximation  and statistical inference can also be traced back to Palasti and Renyi,[27] Sard,[28]  Kimeldorf and Wahba[29] (on the correspondence between Bayesian estimation and spline smoothing/interpolation) and Larkin[30] (on the correspondence between Gaussian process regression and numerical approximation). Although the approach of modelling a perfectly known function as a sample from a random process may seem counterintuitive, a natural framework for understanding it can be found in information-based complexity (IBC),[31] the branch of computational complexity founded on the observation that numerical implementation requires computation with partial information and limited resources. In IBC, the performance of an algorithm operating on incomplete information can be analyzed  in the  worst case  or the average case (randomized) setting  with respect to the missing information. Moreover,  as observed by Packel,[32] the average case setting could be interpreted as a mixed strategy in an adversarial game obtained by lifting a (worst case) minmax problem  to a minmax problem over mixed (randomized) strategies. This observation   initiates [22][4] a natural connection between numerical approximation  and  Wald's decision theory, evidently influenced by von Neumann's  theory of games. To describe this connection consider the optimal recovery setting of Micchelli and Rivlin[33] in which one tries to approximate an unknown function from a finite number of linear measurements on that function. Interpreting this optimal recovery problem as a zero-sum game where Player I selects the unknown function and Player II selects its approximation, and using relative errors in a quadratic norm to define losses, Gaussian priors emerge[4] as optimal mixed strategies for such games, and the covariance operator of the optimal Gaussian prior is determined by the quadratic norm used to define the relative error of the recovery.

Software

  • ProbNum: Probabilistic Numerics in Python.
  • ProbNumDiffEq.jl: Probabilistic numerical ODE solvers based on filtering implemented in Julia.

See also

References

  1. ^ Hennig, P.; Osborne, M. A.; Girolami, M. (2015). "Probabilistic numerics and uncertainty in computations". Proc. A. 471 (2179): 20150142, 17. doi:10.1098/rspa.2015.0142.
  2. ^ Oates, C. J.; Sullivan, T. J. (2019). "A modern retrospective on probabilistic numerics". Stat. Comput. 29 (6): 1335–1351. doi:10.1007/s11222-019-09902-z.
  3. ^ a b Owhadi, Houman (2015). "Bayesian Numerical Homogenization". Multiscale Modeling & Simulation. 13 (3): 812–828. doi:10.1137/140974596. ISSN 1540-3459.
  4. ^ a b c d e Owhadi, Houman; Scovel, Clint (2019). Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design. Cambridge Monographs on Applied and Computational Mathematics. Cambridge: Cambridge University Press. ISBN 978-1-108-48436-7.
  5. ^ a b Hennig, P. (2015). "Probabilistic interpretation of linear solvers". SIAM Journal on Optimization. 25 (1): 2347–260. doi:10.1137/140955501.
  6. ^ a b Cockayne, J.; Oates, C.; Ipsen, I.; Girolami, M. (2019). "A Bayesian conjugate gradient method". Bayesian Analysis. 14 (3). International Society for Bayesian Analysis: 937–1012. doi:10.1214/19-BA1145.
  7. ^ a b Wenger, J.; Hennig, P. (2020). Probabilistic Linear Solvers for Machine Learning (PDF). Advances in Neural Information Processing Systems. Vol. 33. pp. 6731–6742.
  8. ^ Karvonen, Toni; Särkkä, Simo (2017). Classical quadrature rules via Gaussian processes. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).
  9. ^ Hennig, Philipp; Kiefel, Martin (2013). "Quasi-Newton methods: A new direction". Journal of Machine Learning Research (JMLR). 14 (1): 843–865.
  10. ^ Maren Mahsereci; Philipp Hennig (2015). Probabilistic line searches for stochastic optimization. Advances in Neural Information Processing Systems (NeurIPS).
  11. ^ Hans Kersting; Nicholas Krämer; Martin Schiegg; Christian Daniel; Michael Tiemann; Philipp Hennig (2020). Differentiable Likelihoods for Fast Inversion of ’Likelihood-Free’ Dynamical Systems. International Conference on Machine Learning (ICML).
  12. ^ Schmidt, Jonathan; Krämer, Peter Nicholas; Hennig, Philipp (2021). A Probabilistic State Space Model for Joint Inference from Differential Equations and Data. Advances in Neural Information Processing Systems (NeurIPS).
  13. ^ Diaconis, P. (1988). "Bayesian Numerical Analysis". Statistical Decision Theory and Related Topics IV: 163–175.
  14. ^ O’Hagan, A. (1991). "Bayes–Hermite quadrature". Journal of Statistical Planning and Inference. 29 (3): 245–260. doi:10.1016/0378-3758(91)90002-V.
  15. ^ Rasmussen, C.; Ghahramani, Z. (2002). "Bayesian Monte Carlo". Neural Information Processing Systems: 489–496.
  16. ^ Briol, F.-X.; Oates, C. J.; Girolami, M.; Osborne, M. A.; Sejdinovic, D. (2019). "Probabilistic integration: A role in statistical computation? (with discussion and rejoinder)". Statistical Science. 34 (1): 1–22.
  17. ^ Bartels, S.; Cockayne, J.; Ipsen, I.; Hennig, P. (2019). "Probabilistic linear solvers: a unifying view". Statistics and Computing. 29 (6). Springer: 1249–1263. doi:10.1007/s11222-019-09897-7.
  18. ^ Cockayne, J.; Ipsen, I.; Oates, C.; Reid, T. (2021). "Probabilistic iterative methods for linear systems" (PDF). Journal of Machine Learning Research. 22 (232): 1–34.
  19. ^ Schäfer, Florian; Katzfuss, Matthias; Owhadi, Houman (2021). "Sparse Cholesky Factorization by Kullback–Leibler Minimization". SIAM Journal on Scientific Computing. 43 (3): A2019–A2046. doi:10.1137/20M1336254. ISSN 1064-8275.
  20. ^ Conrad, P.R.; Girolami, M.; Särkkä, S.; Stuart, A.M.; Zygalakis, K. (2017). "Statistical analysis of differential equations: introducing probability measures on numerical solutions". Stat. Comput. 27: 1065–1082. doi:10.1007/s11222-016-9671-0.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  21. ^ Abdulle, A.; Garegnani, G. (2020). "Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration". Stat. Comput. 30 (4): 907–932. doi:10.1007/s11222-020-09926-w.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  22. ^ a b Owhadi, H. (2017). "Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games". SIAM Review. 59 (1): 99–149. doi:10.1137/15M1013894.
  23. ^ Schäfer, Florian; Sullivan, T. J.; Owhadi, Houman (2021). "Compression, Inversion, and Approximate PCA of Dense Kernel Matrices at Near-Linear Computational Complexity". Multiscale Modeling & Simulation. 19 (2): 688–730. doi:10.1137/19M129526X. ISSN 1540-3459.
  24. ^ Poincaré, Henri (1912). Calcul des Probabilités (second ed.). Gauthier-Villars.
  25. ^ Sul′din, A. V. (1959). "Wiener measure and its applications to approximation methods. I". Izv. Vysš. Učebn. Zaved. Matematika. 6 (13): 145–158.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  26. ^ Owhadi, Houman; Scovel, Clint; Schäfer, Florian (2019). "Statistical Numerical Approximation". Notices of the American Mathematical Society. 66 (10): 1608–1617. doi:10.1090/noti1963.
  27. ^ Palasti, I.; Renyi, A (1956). "On interpolation theory and the theory of games". MTA Mat. Kat. Int. Kozl. 1: 529–540.
  28. ^ Sard, A. (1963). Linear Approximation. American Mathematical Society.
  29. ^ Kimeldorf, George S.; Wahba, Grace (1970). "A correspondence between Bayesian estimation on stochastic processes and smoothing by splines". Ann. Math. Statist. 41: 495–502. doi:10.1214/aoms/1177697089.
  30. ^ Larkin, F. M. (1972). "Gaussian measure in Hilbert space and applications in numerical analysis". Rocky Mountain J. Math. 2 (3): 379–421. doi:10.1216/RMJ-1972-2-3-379.
  31. ^ Traub, J. F.; Wasilkowski, G. W.; Woźniakowski, H. (1988). Information-Based Complexity. Computer Science and Scientific Computing. Boston, MA: Academic Press, Inc. ISBN 0-12-697545-0.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  32. ^ Packel, Edward W. (1987). "The algorithm designer versus nature: a game-theoretic approach to information-based complexity". J. Complexity. 3 (3): 244–257. doi:10.1016/0885-064X(87)90014-8.
  33. ^ Micchelli, C. A.; Rivlin, T. J. (1977). "A survey of optimal recovery". Optimal estimation in approximation theory (Proc. Internat. Sympos., Freudenstadt, 1976. pp. 1–54.