Functional principal component analysis: Difference between revisions

Content deleted Content added

Inline

Revision as of 22:31, 3 July 2015

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L² that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions,^[1] or functional regression^[2] and classification.

Formulation

For a square-integrable stochastic process X(t), t ∈ 𝒯, let

\mu (t)={\text{E}}(X(t))

and

G(s,t)={\text{Cov}}(X(s),X(t))=\sum _{k=1}^{\infty }\lambda _{k}\varphi _{k}(s)\varphi _{k}(t),

where λ₁ ≥ λ₂ ≥ ··· ≥ 0 are the eigenvalues and φ₁, φ₂, ... are the orthonormal eigenfunctions of the linear Hilbert–Schmidt operator

G:L^{2}({\mathcal {T}})\rightarrow L^{2}({\mathcal {T}}),\,G(f)=\int _{\mathcal {T}}G(s,t)f(s)ds.

By the Karhunen–Loève theorem, one can express the centered process in the eigenbasis,

X(t)-\mu (t)=\sum _{k=1}^{\infty }\xi _{k}\varphi _{k}(t),

where

\xi _{k}=\int _{\mathcal {T}}(X(t)-\mu (t))\varphi _{k}(t)dt

is the principal component associated with the k-th eigenfunction φ_k, with the properties

{\text{E}}(\xi _{k})=0,{\text{Var}}(\xi _{k})=\lambda _{k}{\text{ and }}{\text{E}}(\xi _{k}\xi _{l})=0{\text{ for }}k\neq l.

The centered process is then equivalent to ξ₁, ξ₂, .... A common assumption is that X can be represented by only the first few eigenfunctions (after subtracting the mean function), i.e.

X(t)\approx X_{m}(t)=\mu (t)+\sum _{k=1}^{m}\xi _{k}\varphi _{k}(t),

where

\mathrm {E} \left(\int _{\mathcal {T}}\left(X(t)-X_{m}(t)\right)^{2}dt\right)=\sum _{j>m}\lambda _{j}\rightarrow 0{\text{ as }}m\rightarrow \infty .

Interpretation of eigenfunctions

The first eigenfunction φ₁ depicts the dominant mode of variation of X.

\varphi _{1}={\underset {\Vert \mathbf {\varphi } \Vert =1}{\operatorname {arg\,max} }}\left\{\operatorname {Var} (\int _{\mathcal {T}}(X(t)-\mu (t))\varphi (t)dt)\right\},

where

\Vert \mathbf {\varphi } \Vert =\left(\int _{\mathcal {T}}\varphi (t)^{2}dt\right)^{\frac {1}{2}}.

The k-th eigenfunction φ_k is the dominant mode of variation orthogonal to φ₁, φ₂, ... , φ_k-1,

\varphi _{k}={\underset {\Vert \mathbf {\varphi } \Vert =1,\langle \varphi ,\varphi _{j}\rangle =0{\text{ for }}j=1,\dots ,k-1}{\operatorname {arg\,max} }}\left\{\operatorname {Var} (\int _{\mathcal {T}}(X(t)-\mu (t))\varphi (t)dt)\right\},

where

\langle \varphi ,\varphi _{j}\rangle =\int _{\mathcal {T}}\varphi (t)\varphi _{j}(t)dt,{\text{ for }}j=1,\dots ,k-1.

Estimation

Let Y_ij = X_i(t_ij) + ε_ij be the observations made at locations (usually time points) t_ij, where X_i is the i-th realization of the smooth stochastic process that generates the data, and ε_ij are identically and independently distributed normal random variable with mean 0 and variance σ², j = 1, 2, ..., m_i. To obtain an estimate of the mean function μ(t_ij), if a dense sample on a regular grid is available, one may take the average at each location t_ij:

{\hat {\mu }}(t_{ij})={\frac {1}{n}}\sum _{i=1}^{n}Y_{ij}.

If the observations are sparse, one needs to smooth the data pooled from all observations to obtain the mean estimate,^[3] using smoothing methods like local linear smoothing or spline smoothing.

Then the estimate of the covariance function ${\hat {G}}(s,t)$ is obtained by averaging (in the dense case) or smoothing (in the sparse case) the raw covariances

G_{i}(t_{ij},t_{il})=(Y_{ij}-{\hat {\mu }}(t_{ij}))(Y_{il}-{\hat {\mu }}(t_{il})),j\neq l,i=1,\dots ,n.

Note that the diagonal elements of G_i should be removed because they contain measurement error.^[4]

In practice, ${\hat {G}}(s,t)$ is discretized to an equal-spaced dense grid, and the estimation of eigenvalues λ_k and eigenvectors v_k is carried out by numerical linear algebra.^[5] The eigenfunction estimates ${\hat {\varphi }}_{k}$ can then be obtained by interpolating the eigenvectors ${\hat {v_{k}}}.$

The fitted covariance should be positive definite and symmetric and is then obtained as

{\tilde {G}}(s,t)=\sum _{\lambda _{k}>0}{\hat {\lambda }}_{k}{\hat {\varphi }}_{k}(s){\hat {\varphi }}_{k}(t).

Let ${\hat {V}}(t)$ be a smoothed version of the diagonal elements G_i(t_ij, t_ij) of the raw covariance matrices. Then ${\hat {V}}(t)$ is an estimate of (G(t, t) + σ²). An estimate of σ² is obtained by

{\hat {\sigma }}^{2}={\frac {2}{|{\mathcal {T}}|}}\int _{\mathcal {T}}({\hat {V}}(t)-{\tilde {G}}(t,t))dt,

if

{\hat {\sigma }}^{2}>0;

otherwise

{\hat {\sigma }}^{2}=0.

If the observations X_ij, j=1, 2, ..., m_i are dense in 𝒯, then the k-th FPC ξ_k can be estimated by numerical integration, implementing

{\hat {\xi }}_{k}=\langle X-{\hat {\mu }},{\hat {\varphi }}_{k}\rangle .

However, if the observations are sparse, this method will not work. Instead, one can use best linear unbiased predictors,^[3] yielding

{\hat {\xi }}_{k}={\hat {\lambda }}_{k}{\hat {\varphi }}_{k}^{T}{\hat {\Sigma }}_{Y_{i}}^{-1}(Y_{i}-{\hat {\mu }}),

where

{\hat {\Sigma }}_{Y_{i}}={\tilde {G}}+{\hat {\sigma }}^{2}\mathbf {I} _{m_{i}}

,

and ${\tilde {G}}$ is evaluated at the grid points generated by t_ij, j = 1, 2, ..., m_i. The algorithm, PACE, has an available Matlab package.^[6]

Asymptotic convergence properties of these estimates have been investigated.^[3]^[7]^[8]

Applications

FPCA can be applied for displaying the modes of functional variation,^[1]^[9] in scatterplots of FPCs against each other or of responses against FPCs, for modeling sparse longitudinal data,^[3] or for functional regression and classification, e.g., functional linear regression.^[2] Scree plots and other methods can be used to determine the number of included components.

Connection with principal component analysis

The following table shows a comparison of various elements of principal component analysis (PCA) and FPCA. The two methods are both used for dimensionality reduction. In implementations, FPCA uses a PCA step.

However, PCA and FPCA differ in some critical aspects. First, the order of multivariate data in PCA can be permuted, which has no effect on the analysis, but the order of functional data carries time or space information and cannot be reordered. Second, the spacing of observations in FPCA matters, while there is no spacing issue in PCA. Third, regular PCA does not work for high-dimensional data without regularization, while FPCA has a built-in regularization due to the smoothness of the functional data and the truncation to a finite number of included components.

Element	In PCA	In FPCA
Data	$X\in \mathbb {R} ^{p}$	$X\in L^{2}({\mathcal {T}})$
Dimension	$p<\infty$	$\infty$
Mean	$\mu ={\text{E}}(X)$	$\mu (t)={\text{E}}(X(t))$
Covariance	${\text{Cov}}(X)=\Sigma _{p\times p}$	${\text{Cov}}(X(s),X(t))=G(s,t)$
Eigenvalues	$\lambda _{1},\lambda _{2},\dots ,\lambda _{p}$	$\lambda _{1},\lambda _{2},\dots$
Eigenvectors/Eigenfunctions	$\mathbf {v} _{1},\mathbf {v} _{2},\dots ,\mathbf {v} _{p}$	$\varphi _{1}(t),\varphi _{2}(t),\dots$
Inner Product	$\langle \mathbf {X} ,\mathbf {Y} \rangle =\sum _{k=1}^{p}X_{k}Y_{k}$	$\langle X,Y\rangle =\int _{\mathcal {T}}X(t)Y(t)dt$
Principal Components	$z_{k}=\langle X-\mu ,\mathbf {v_{k}} \rangle ,k=1,2,\dots ,p$	$\xi _{k}=\langle X-\mu ,\varphi _{k}\rangle ,k=1,2,\dots$

Notes

^ ^a ^b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1080/00031305.1992.10475870, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1080/00031305.1992.10475870 instead.
^ ^a ^b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/009053605000000660, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/009053605000000660 instead.
^ ^a ^b ^c ^d Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1198/016214504000001745, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1198/016214504000001745 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1080/01621459.1998.10473801, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1080/01621459.1998.10473801 instead.
^ Rice, John; Silverman, B. (1991). "Estimating the Mean and Covariance Structure Nonparametrically When the Data are Curves". Journal of the Royal Statistical Society. Series B (Methodological). 53 (1). Wiley: 233–243.
^ "PACE: Principal Analysis by Conditional Expectation".
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/009053606000000272, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/009053606000000272 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/10-AOS813, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/10-AOS813 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1186/s13040-015-0051-7, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1186/s13040-015-0051-7 instead.

References

James O. Ramsay; B. W. Silverman (8 June 2005). Functional Data Analysis. Springer. ISBN 978-0-387-40080-8.

[jones_and_rice_1992-1] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1080/00031305.1992.10475870, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1080/00031305.1992.10475870 instead.

[Yao_2005b-2] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/009053605000000660, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/009053605000000660 instead.

[yao_2005a-3] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1198/016214504000001745, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1198/016214504000001745 instead.

[Staniswalis_and_Lee_1998-4] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1080/01621459.1998.10473801, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1080/01621459.1998.10473801 instead.

[rice_and_silverman_1991-5] Rice, John; Silverman, B. (1991). "Estimating the Mean and Covariance Structure Nonparametrically When the Data are Curves". Journal of the Royal Statistical Society. Series B (Methodological). 53 (1). Wiley: 233–243.

[pace-6] "PACE: Principal Analysis by Conditional Expectation".

[hall_2006-7] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/009053606000000272, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/009053606000000272 instead.

[li_2010-8] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1214/10-AOS813, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1214/10-AOS813 instead.

[madrigal_and_krajewski_2015-9] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1186/s13040-015-0051-7, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1186/s13040-015-0051-7 instead.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 69: / Line 69: @@
 ==Applications==
-FPCA can be applied for displaying the modes of functional variation,<ref name="jones and rice 1992"/> in scatterplots of FPCs against each other or of responses against FPCs, for modeling sparse [[longitudinal data]],<ref name="yao 2005a"/> or for functional regression and classification, e.g., functional linear regression.<ref name="Yao 2005b"/> [[Factor analysis#Criteria for determining the number of factors|Scree plots]] and other methods can be used to determine the number of included components.
+FPCA can be applied for displaying the modes of functional variation,<ref name="jones and rice 1992"/><ref name="madrigal and krajewski 2015">{{cite doi|10.1186/s13040-015-0051-7}}</ref> in scatterplots of FPCs against each other or of responses against FPCs, for modeling sparse [[longitudinal data]],<ref name="yao 2005a"/> or for functional regression and classification, e.g., functional linear regression.<ref name="Yao 2005b"/> [[Factor analysis#Criteria for determining the number of factors|Scree plots]] and other methods can be used to determine the number of included components.
 ==Connection with principal component analysis==

Revision as of 22:31, 3 July 2015

Formulation

Interpretation of eigenfunctions

Estimation

Applications

Connection with principal component analysis

See also

Notes

References