# Parareal

Parareal is a parallel algorithm from numerical analysis and used for the solution of initial value problems. It was introduced in 2001 by Lions, Maday and Turinici. Since then, it has become one of the most widely studied parallel-in-time integration methods.[citation needed]

## Parallel-in-time integration methods

In contrast to e.g. Runge-Kutta or multi-step methods, some of the computations in Parareal can be performed in parallel and Parareal is therefore one example of a parallel-in-time integration method. While historically most efforts to parallelize the numerical solution of partial differential equations focussed on the spatial discretization, in view of the challenges from exascale computing, parallel methods for temporal discretization have been identified as a possible way to increase concurrency in numerical software. Because Parareal computes the numerical solution for multiple time steps in parallel, it is categorized as a parallel across the steps method. This is in contrast to approaches using parallelism across the method like parallel Runge-Kutta or extrapolation methods, where independent stages can be computed in parallel or parallel across the system methods like waveform relaxation.

## History

Parareal can be derived as both a multigrid method in time method or as multiple shooting along the time axis. Both ideas, multigrid in time as well as adopting multiple shooting for time integration, go back to the 1980s and 1990s. Parareal is a widely studied method and has been used and modified for a range of different applications. Ideas to parallelize the solution of initial value problems go back even further: the first paper proposing a parallel-in-time integration method appeared in 1964.

## Algorithm Play media
Visualization of the Parareal algorithm. The coarse propagator here is labelled ${\bar {\varphi }}$ whereas the fine propagator is labelled $\varphi$ .

Parareal solves an initial value problem of the form

${\dot {y}}(t)=f(y(t),t),\quad y(t_{0})=y_{0}\quad {\text{with}}\quad t_{0}\leq t\leq T.$ Here, the right hand side $f$ can correspond to the spatial discretization of a partial differential equation in a method of lines approach.

Parareal now requires a decomposition of the time interval $[t_{0},T]$ into $P$ so-called time slices $[t_{j},t_{j+1}]$ such that

$[t_{0},T]=[t_{0},t_{1}]\cup [t_{1},t_{2}]\cup \ldots \cup [t_{P-1},t_{P}].$ Each time slice is assigned to one processing unit when parallelizing the algorithm, so that $P$ is equal to the number of processing units used for Parareal: in an MPI based code for example, this would be the number of processes, while in an OpenMP based code, $P$ would be equal to the number of threads.

Parareal is based on the iterative application of two methods for integration of ordinary differential equations. One, commonly labelled ${\mathcal {F}}$ , should be of high accuracy and computational cost while the other, typically labelled ${\mathcal {G}}$ , must be computationally cheap but can be much less accurate. Typically, some form of Runge-Kutta method is chosen for both coarse and fine integrator, where ${\mathcal {G}}$ might be of lower order and use a larger time step than ${\mathcal {F}}$ . If the initial value problem stems from the discretization of a PDE, ${\mathcal {G}}$ can also use a coarser spatial discretization, but this can negatively impact convergence unless high order interpolation is used. The result of numerical integration with one of these methods over a time slice $[t_{j},t_{j+1}]$ for some starting value $y_{j}$ given at $t_{j}$ is then written as

$y={\mathcal {F}}(y_{j},t_{j},t_{j+1})$ or $y={\mathcal {G}}(y_{j},t_{j},t_{j+1})$ .

Serial time integration with the fine method would then correspond to a step-by-step computation of

$y_{j+1}={\mathcal {F}}(y_{j},t_{j},t_{j+1}),\quad j=0,\ldots ,P-1.$ Parareal instead uses the following iteration

$y_{j+1}^{k+1}={\mathcal {G}}(y_{j}^{k+1},t_{j},t_{j+1})+{\mathcal {F}}(y_{j}^{k},t_{j},t_{j+1})-{\mathcal {G}}(y_{j}^{k},t_{j},t_{j+1}),\quad j=0,\ldots ,P-1,\quad k=0,\ldots ,K-1,$ where $k$ is the iteration counter. As the iteration converges and $y_{j}^{k+1}-y_{j}^{k}\to 0$ , the terms from the coarse method cancel out and Parareal reproduces the solution that is obtained by the serial execution of the fine method only. It can be shown that Parareal converges after a maximum of $P$ iterations. For Parareal to provide speedup, however, it has to converge in a number of iterations significantly smaller than the number of time slices, that is $K\ll P$ .

In the Parareal iteration, the computationally expensive evaluation of ${\mathcal {F}}(y_{j}^{k},t_{j},t_{j+1})$ can be performed in parallel on $P$ processing units. By contrast, the dependency of $y_{j+1}^{k+1}$ on ${\mathcal {G}}(y_{j}^{k+1},t_{j},t_{j+1})$ means that the coarse correction has to be computed in serial order.

### Speedup

Under some assumptions, a simple theoretical model for the speedup of Parareal can be derived. Although in applications these assumptions can be too restrictive, the model still is useful to illustrate the trade offs that are involved in obtaining speedup with Parareal.

First, assume that every time slice $[t_{j},t_{j+1}]$ consists of exactly $N_{f}$ steps of the fine integrator and of $N_{c}$ steps of the coarse integrator. This includes in particular the assumption that all time slices are of identical length and that both coarse and fine integrator use a constant step size over the full simulation. Second, denote by $\tau _{f}$ and $\tau _{c}$ the computing time required for a single step of the fine and coarse methods, respectively, and assume that both are constant. This is typically not exactly true when an implicit method is used, because then runtimes vary depending on the number of iterations required by the iterative solver.

Under these two assumptions, the runtime for the fine method integrating over $P$ time slices can be modelled as

$c_{\text{fine}}=PN_{f}\tau _{f}.$ The runtime of Parareal using $P$ processing units and performing $K$ iterations is

$c_{\text{parareal}}=(K+1)PN_{c}\tau _{c}+KN_{f}\tau _{f}.$ Speedup of Parareal then is

$S_{p}={\frac {c_{\text{fine}}}{c_{\text{parareal}}}}={\frac {1}{(K+1){\frac {N_{c}}{N_{f}}}{\frac {\tau _{c}}{\tau _{f}}}+{\frac {K}{P}}}}\leq \min \left\{{\frac {N_{f}\tau _{f}}{N_{c}\tau _{c}}},{\frac {P}{K}}\right\}.$ These two bounds illustrate the trade off that has to be made in choosing the coarse method: on the one hand, it has to be cheap and/or use a much larger time step to make the first bound as large as possible, on the other hand the number of iterations $K$ has to be kept low to keep the second bound large. In particular, Parareal's parallel efficiency is bounded by

$E_{p}={\frac {S_{p}}{P}}\leq {\frac {1}{K}},$ that is by the inverse of the number of required iterations.

### Instability for imaginary eigenvalues

The vanilla version of Parareal has issues for problems with imaginary eigenvalues. It typically only converges toward the very last iterations, that is as $k$ approaches $P$ , and the speedup $S_{p}$ is always going to be smaller than one. So either the number of iterations is small and Parareal is unstable or, if $k$ is large enough to make Parareal stable, no speedup is possible. This also means that Parareal is typically unstable for hyperbolic equations. Even though the formal analysis by Gander and Vandewalle covers only linear problems with constant coefficients, the problem also arises when Parareal is applied to the nonlinear Navier–Stokes equations when the viscosity coefficient becomes too small and the Reynolds number too large. Different approaches exist to stabilise Parareal, one being Krylov-subspace enhanced Parareal.

## Variants

There are multiple algorithms that are directly based or at least inspired by the original Parareal algorithm.

### Krylov-subspace enhanced Parareal

Early on it was recognised that for linear problems information generated by the fine method ${\mathcal {F}}_{\delta t}$ can be used to improve the accuracy of the coarse method ${\mathcal {G}}_{\Delta t}$ . Originally, the idea was formulated for the parallel implicit time-integrator PITA, a method closely related to Parareal but with small differences in how the correction is done. In every iteration $k$ the result ${\mathcal {F}}_{\delta t}(y_{j}^{k})$ is computed for values $u_{j}^{k}\in \mathbb {R} ^{d}$ for $j=0,\ldots ,P-1$ . Based on this information, the subspace

$S_{k}:=\left\{y_{j}^{k'}:0\leq k'\leq k,j=0,\ldots ,P-1\right\}$ is defined and updated after every Parareal iteration. Denote as $P_{k}$ the orthogonal projection from $\mathbb {R} ^{d}$ to $S_{k}$ . Then, replace the coarse method with the improved integrator ${\mathcal {K}}_{\Delta t}(y)={\mathcal {F}}_{\delta t}(P_{k}y)+{\mathcal {G}}_{\Delta t}((I-P_{k})y)$ .

As the number of iterations increases, the space $S_{k}$ will grow and the modified propagator ${\mathcal {K}}_{\Delta t}$ will become more accurate. This will lead to faster convergence. This version of Parareal can also stably integrate linear hyperbolic partial differential equations. An extension to nonlinear problems based on the reduced basis method exists as well.

### Hybrid Parareal spectral deferred corrections

A method with improved parallel efficiency based on a combination of Parareal with spectral deferred corrections (SDC)  has been proposed by M. Minion. It limits the choice for coarse and fine integrator to SDC, sacrificing flexibility for improved parallel efficiency. Instead of the limit of $1/K$ , the bound on parallel efficiency in the hybrid method becomes

$E_{p}\leq {\frac {K_{s}}{K_{p}}}$ with $K_{s}$ being the number of iterations of the serial SDC base method and $K_{p}$ the typically greater number of iterations of the parallel hybrid method. The Parareal-SDC hybrid has been further improved by addition of a full approximation scheme as used in nonlinear multigrid. This led to the development of the parallel full approximation scheme in space and time (PFASST). Performance of PFASST has been studied for PEPC, a Barnes-Hut tree code based particle solver developed at Juelich Supercomputing Centre. Simulations using all 262,144 cores on the IBM BlueGene/P system JUGENE showed that PFASST could produce additional speedup beyond saturation of the spatial tree parallelisation.

### Multigrid reduction in time (MGRIT)

The multigrid reduction in time method (MGRIT) generalises the interpretation of Parareal as a multigrid-in-time algorithms to multiple levels using different smoothers. It is a more general approach but for a specific choice of parameters it is equivalent to Parareal. The XBraid library implementing MGRIT is being developed by Lawrence Livermore National Laboratory.

### ParaExp

ParaExp uses exponential integrators within Parareal. While limited to linear problems, it can produce almost optimal parallel speedup.