# Semiparametric model

In statistics a semiparametric model is a model that has parametric and nonparametric components.

A model is a collection of distributions: $\{P_\theta: \theta \in \Theta\}$ indexed by a parameter $\theta$.

• A parametric model is one in which the indexing parameter is a finite-dimensional vector (in $k$-dimensional Euclidean space for some integer $k$); i.e. the set of possible values for $\theta$ is a subset of $\mathbb{R}^k$, or $\Theta \subset \mathbb{R}^k$. In this case we say that $\theta$ is finite-dimensional.
• In nonparametric models, the set of possible values of the parameter $\theta$ is a subset of some space, not necessarily finite-dimensional. For example, we might consider the set of all distributions with mean 0. Such spaces are vector spaces with topological structure, but may not be finite-dimensional as vector spaces. Thus, $\Theta \subset \mathbb{F}$ for some possibly infinite-dimensional space $\mathbb{F}$.
• In semiparametric models, the parameter has both a finite-dimensional component and an infinite-dimensional component (often a real-valued function defined on the real line). Thus the parameter space $\Theta$ in a semiparametric model satisfies $\Theta \subset \mathbb{R}^k \times \mathbb{F}$, where $\mathbb{F}$ is an infinite-dimensional space.

It may appear at first that semiparametric models include nonparametric models, since they have an infinite-dimensional as well as a finite-dimensional component. However, a semiparametric model is considered to be "smaller" than a completely nonparametric model because we are often interested only in the finite-dimensional component of $\theta$. That is, we are not interested in estimating the infinite-dimensional component. In nonparametric models, by contrast, the primary interest is in estimating the infinite-dimensional parameter. Thus the estimation task is statistically harder in nonparametric models.

These models often use smoothing or kernels.

## Example

A well-known example of a semiparametric model is the Cox proportional hazards model.[1] If we are interested in studying the time $T$ to an event such as death due to cancer or failure of a light bulb, the Cox model specifies the following distribution function for $T$:

$F(t) = 1 - \exp\left(-\int_0^t \lambda_0(u) e^{\beta'x} du\right),$

where $x$ is the covariate vector, and $\beta$ and $\lambda_0(u)$ are unknown parameters. $\theta = (\beta, \lambda_0(u))$. Here $\beta$ is finite-dimensional and is of interest; $\lambda_0(u)$ is an unknown non-negative function of time (known as the baseline hazard function) and is often a nuisance parameter. The collection of possible candidates for $\lambda_0(u)$ is infinite-dimensional.