Hamilton–Jacobi–Bellman equation

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In optimal control theory, the Hamilton–Jacobi–Bellman (HJB) equation gives a necessary and sufficient condition for optimality of a control with respect to a loss function.[1] It is, in general, a nonlinear partial differential equation in the value function, which means its solution is the value function itself. Once the solution is known, it can be used to obtain the optimal control by taking the maximizer/minimizer of the Hamiltonian involved in the HJB equation.[2][3]

The equation is a result of the theory of dynamic programming which was pioneered in the 1950s by Richard Bellman and coworkers.[4][5][6] The connection to the Hamilton–Jacobi equation from classical physics was first drawn by Rudolf Kálmán.[7] In discrete-time problems, the equation is usually referred to as the Bellman equation.

While classical variational problems, for example the brachistochrone problem, can be solved using the Hamilton–Jacobi–Bellman equation,[8] the method can be applied to a broader spectrum of problems. Further it can be generalized to stochastic systems, in which case the HJB equation is a second-order partial differential equation.[9] A major drawback, however, is that the HJB equation admits classical solutions only for a sufficiently smooth value function, which is not guaranteed in most situations. Instead, the notion of a viscosity solution is required, in which conventional derivatives are replaced by (set-valued) subderivatives.[10]

Optimal control problems[edit]

Consider the following problem in deterministic optimal control over the time period :

where C[ ] is the scalar cost rate function and D[ ] is a function that gives the economic value or utility at the final state, x(t) is the system state vector, x(0) is assumed given, and u(t) for 0 ≤ t ≤ T is the control vector that we are trying to find.

The system must also be subject to

where F[ ] gives the vector determining physical evolution of the state vector over time.

The partial differential equation[edit]

For this simple system (letting ), the Hamilton–Jacobi–Bellman partial differential equation is

subject to the terminal condition

where denotes the partial derivative of with respect to the time variable . Here denotes the dot product of the vectors and and the gradient of with respect to the variables .

The unknown scalar in the above partial differential equation is the Bellman value function, which represents the cost incurred from starting in state at time and controlling the system optimally from then until time .

Deriving the equation[edit]

Intuitively, the HJB equation can be derived as follows. If is the optimal cost-to-go function (also called the 'value function'), then by Richard Bellman's principle of optimality, going from time t to t + dt, we have

Note that the Taylor expansion of the first term on the right-hand side is

where denotes the terms in the Taylor expansion of higher order than one in little-o notation. Then if we subtract from both sides, divide by dt, and take the limit as dt approaches zero, we obtain the HJB equation defined above.

Solving the equation[edit]

The HJB equation is usually solved backwards in time, starting from and ending at .[citation needed]

When solved over the whole of state space and is continuously differentiable, the HJB equation is a necessary and sufficient condition for an optimum when the terminal state is unconstrained.[11] If we can solve for then we can find from it a control that achieves the minimum cost.

In general case, the HJB equation does not have a classical (smooth) solution. Several notions of generalized solutions have been developed to cover such situations, including viscosity solution (Pierre-Louis Lions and Michael Crandall),[12] minimax solution (Andrei Izmailovich Subbotin [ru]), and others.

Extension to stochastic problems[edit]

The idea of solving a control problem by applying Bellman's principle of optimality and then working out backwards in time an optimizing strategy can be generalized to stochastic control problems. Consider similar as above

now with the stochastic process to optimize and the steering. By first using Bellman and then expanding with Itô's rule, one finds the stochastic HJB equation

where represents the stochastic differentiation operator, and subject to the terminal condition

Note that the randomness has disappeared. In this case a solution of the latter does not necessarily solve the primal problem, it is a candidate only and a further verifying argument is required. This technique is widely used in Financial Mathematics to determine optimal investment strategies in the market (see for example Merton's portfolio problem).

Application to LQG Control[edit]

As an example, we can look at a system with linear stochastic dynamics and quadratic cost. If the system dynamics is given by

and the cost accumulates at rate , the HJB equation is given by

with optimal action given by

Assuming a quadratic form for the value function, we obtain the usual Riccati equation for the Hessian of the value function as is usual for Linear-quadratic-Gaussian control.

See also[edit]

  • Bellman equation, discrete-time counterpart of the Hamilton–Jacobi–Bellman equation.
  • Pontryagin's maximum principle, necessary but not sufficient condition for optimum, by maximizing a Hamiltonian, but this has the advantage over HJB of only needing to be satisfied over the single trajectory being considered.

References[edit]

  1. ^ Kirk, Donald E. (1970). Optimal Control Theory: An Introduction. Englewood Cliffs, NJ: Prentice-Hall. pp. 86–90. ISBN 0-13-638098-0.
  2. ^ Yong, Jiongmin; Zhou, Xun Yu (1999). "Dynamic Programming and HJB Equations". Stochastic Controls : Hamiltonian Systems and HJB Equations. Springer. pp. 157–215 [p. 163]. ISBN 0-387-98723-1.
  3. ^ Naidu, Desineni Subbaram (2003). "The Hamilton–Jacobi–Bellman Equation". Optimal Control Systems. Boca Raton: CRC Press. pp. 277–283 [p. 280]. ISBN 0-8493-0892-5.
  4. ^ Bellman, R. E. (1954). "Dynamic Programming and a new formalism in the calculus of variations". Proc. Natl. Acad. Sci. 40 (4): 231–235. Bibcode:1954PNAS...40..231B. doi:10.1073/pnas.40.4.231. PMC 527981. PMID 16589462.
  5. ^ Bellman, R. E. (1957). Dynamic Programming. Princeton, NJ.
  6. ^ Bellman, R.; Dreyfus, S. (1959). "An Application of Dynamic Programming to the Determination of Optimal Satellite Trajectories". J. Br. Interplanet. Soc. 17: 78–83.
  7. ^ Kálmán, Rudolf E. (1963). "The Theory of Optimal Control and the Calculus of Variations". In Bellman, Richard (ed.). Mathematical Optimization Techniques. Berkeley: University of California Press. pp. 309–331.
  8. ^ Kemajou-Brown, Isabelle (2016). "Brief History of Optimal Control Theory and Some Recent Developments". In Budzban, Gregory; Hughes, Harry Randolph; Schurz, Henri (eds.). Probability on Algebraic and Geometric Structures. Contemporary Mathematics. 668. pp. 119–130. doi:10.1090/conm/668/13400. ISBN 9781470419455.
  9. ^ Chang, Fwu-Ranq (2004). Stochastic Optimization in Continuous Time. Cambridge, UK: Cambridge University Press. pp. 113–168. ISBN 0-521-83406-6.
  10. ^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.
  11. ^ Bertsekas, Dimitri P. (2005). Dynamic Programming and Optimal Control. Athena Scientific.
  12. ^ Bardi, Martino; Capuzzo-Dolcetta, Italo (1997). Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Boston: Birkhäuser. ISBN 0-8176-3640-4.

Further reading[edit]