# Functional data analysis

Functional data analysis (FDA) is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework each sample element is considered to be a function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc.

## History

Functional data analysis has roots going back to work by Grenander and Karhunen in the 1940s and 1950s. They considered the decomposition of square-integrable continuous time stochastic process into eigencomponents, now known as the Karhunen-Loève decomposition. A rigorous analysis of functional principal components analysis was done in the 1970s by Kleffe, Dauxois and Pousse including results about the asymptotic distribution of the eigenvalues. More recently in the 1990s and 2000s the field has focused more on application and understanding the effects of dense and sparse observations schemes. Core contributions in this era were made by James O. Ramsay (who coined the term "functional data analysis" in this period), Bernard Silverman and John Rice.

## Mathematical formalism

Random functions can be viewed as random elements taking values in a Hilbert space, or as a stochastic process. The former is mathematically convenient, whereas the latter is somewhat more suitable from an applied perspective. These two approaches coincide if the random functions are continuous and a condition called mean-squared continuity is satisfied. For more on the probabilistic underpinnings of functional data analysis, see Chapter 7.

### Hilbertian random variables

In the Hilbert space viewpoint, one considers an $H$ -valued random element $X$ , where $H$ is a separable Hilbert space such as the space of square-integrable functions $L^{2}[0,1]$ . Under the integrability condition that $\mathbb {E} \|X\|$ is finite, one can define the mean of $X$ as the unique element $\mu \in H$ satisfying

$\mathbb {E} \langle X,h\rangle =\langle \mu ,h\rangle ,\qquad h\in H.$ This formulation is the Pettis integral but the mean can also be defined as $\mu =\mathbb {E} X$ the Bochner sense. Under the integrability condition that $\mathbb {E} \|X\|^{2}$ is finite, the covariance operator of $X$ is a linear operator ${\mathcal {C}}:H\to H$ that is uniquely defined by the relation

${\mathcal {C}}h=\mathbb {E} [\langle h,X-\mu \rangle (X-\mu )],\qquad h\in H,$ or, in tensor form, ${\mathcal {C}}=\mathbb {E} [(X-\mu )\otimes (X-\mu )]$ . The spectral theorem allows to decompose $X$ as the Karhunen-Loève decomposition

$X=\mu +\sum _{i=1}^{\infty }\langle X,\varphi _{i}\rangle \varphi _{i},$ where $\varphi _{i}$ are eigenvectors of ${\mathcal {S}}$ , corresponding to the nonnegative eigenvalues of ${\mathcal {S}}$ , in a nonincreasing order. Truncating this infinite series to a finite order underpins functional principal component analysis.

### Stochastic processes

The Hilbertian point of view is mathematically convenient, but abstract; the above considerations do not necessarily even view $X$ as a function at all, since common choices of $H$ like $L^{2}[0,1]$ and Sobolev spaces consist of equivalence classes, not functions. The stochastic process perspective views $X$ as a collection of random variables

$\{X(t)\}_{t\in [0,1]}$ indexed by the unit interval (or more generally some compact metric space $K$ ). The mean and covariance functions are defined in a pointwise manner as

$\mu (t)=\mathbb {E} X(t),\qquad c(s,t)={\textrm {Cov}}[(X(s),X(t)],\qquad s,t\in [0,1]$ (if $\mathbb {E} [X(t)^{2}]<\infty$ for all $t\in [0,1]$ ). We can hope to view $X$ as a random element on the Hilbert function space $H=L^{2}[0,1]$ . However, additional conditions are required for such a pursuit to be fruitful, since if we let $X(t)$ be Gaussian white noise, i.e. $X(t)$ is standard Gaussian and independent from $X(s)$ for any $s,t\in [0,1]$ , it is clear that we have no hope of viewing this as a square integrable function.

A convenient sufficient condition is mean square continuity, stipulating that $\mu$ and $c$ are continuous functions. In this case $c$ defines a covariance operator ${\mathcal {C}}:H\to H$ by

$({\mathcal {C}}f)(t)=\int _{K}c(s,t)f(s)\,\mathrm {d} s.$ The spectral theorem applies to ${\mathcal {C}}$ , yielding eigenpairs $(\lambda _{j},\varphi _{j})$ , so that in tensor product notation ${\mathcal {C}}$ writes

${\mathcal {C}}=\sum _{j=1}^{\infty }\lambda _{j}\varphi _{j}\otimes \varphi _{j}.$ Moreover, since ${\mathcal {S}}f$ is continuous for all $f\in H$ , all the $\varphi _{j}$ 's are continuous. Mercer's theorem then states that the covariance function $c$ admits an analogous decomposition

$\sup _{s,t\in [0,1]}\left|c(s,t)-\sum _{j=1}^{N}\lambda _{j}\varphi _{j}(s)\varphi _{j}(t)\right|\to 0,\qquad N\to \infty .$ Finally, under the extra assumption that $X$ has continuous sample paths, namely that with probability one, the random function $X:[0,1]\to \mathbb {R}$ is continuous, the Karhunen-Loève expansion above holds for $X$ and the Hilbert space machinery can be subsequently applied. Continuity of sample paths can be shown using Kolmogorov continuity theorem.

## Regression methods for functional data

Several methods have been developed for simple functional data.

### Scalar-on-function regression

A well-studied model for scalar-on-function regression is a generalisation of linear regression. Classical linear regression assumes that a scalar variable of interest $Y$ is related to a $p$ -dimensional covariate vector $X$ through the equation

$Y=\langle X,\beta \rangle +\varepsilon =X_{1}\beta _{1}+\dots +X_{p}\beta _{p}+\varepsilon$ for a $p$ -dimensional vector of coefficients $\beta$ and a scalar noise variable $\varepsilon$ , where $\langle X,\beta \rangle$ denotes the standard inner product on $\mathbb {R} ^{p}$ . If we instead observe a functional variable $X$ , which we assume is an element of the space of square-integrable functions on the unit interval, we can consider the same linear regression model as above using the $L^{2}[0,1]$ inner product. In other words, we consider the model

$Y=\int _{0}^{1}X(t)\beta (t)\,\mathrm {d} t+\varepsilon$ for a square-integrable coefficient function $\beta$ and $\varepsilon$ as before (see Chapter 13).

### Function-on-scalar regression

Analogously to the scalar-on-function regression model, we can consider a functional $Y$ and $p$ -dimensional covariate vector $X$ and again draw inspiration from the usual linear regression model by modelling $Y$ as a linear combination of $p$ functions. In other words, we assume that the relationship between $Y$ and $X$ is

$Y(t)=X_{1}\beta _{1}(t)+\dots X_{p}\beta _{p}(t)+\varepsilon (t)$ for functions $\beta _{1},\dots ,\beta _{p}$ and functional error term $\varepsilon$ .

### Function-on-function regression

Both of the previous regression models can be viewed as instances of a general linear model between Hilbert spaces. Assuming that $X$ and $Y$ are elements of Hilbert spaces $H_{X}$ and $H_{Y}$ , the Hilbertian linear model assumes that

$Y={\mathcal {S}}X+\varepsilon$ for a Hilbert-Schmidt operator ${\mathcal {S}}:H_{X}\to H_{Y}$ and a noise variable $\varepsilon$ taking values in $H_{Y}$ . If $H_{X}=L^{2}[0,1]$ and $H_{Y}=\mathbb {R}$ , we get the scalar-on-function regression model above. Similarly, if $H_{X}=\mathbb {R} ^{p}$ and $H_{Y}=L^{2}[0,1]$ then we get the function-on-scalar regression model above. If we let $H_{X}=H_{Y}=L^{2}[0,1]$ , we get the function-on-function linear regression model which can equivalently be written

$Y(t)=\int _{0}^{1}X(s)\beta (s,t)\,\mathrm {d} s+\varepsilon (t)$ for a square-integrable coefficient function $\beta$ and functional noise variable $\varepsilon$ .

### Practical considerations

While the presentation of the models above assumes fully-observed functions, software is available for fitting the models with discretely-observed functions in software such as R. Packages for R include refund  and FDboost  that utilise reformulations of the functional models as generalized additive models and boosted models, respectively.