# Observed information

In statistics, the observed information, or observed Fisher information, is the negative of the second derivative (the Hessian matrix) of the "log-likelihood" (the logarithm of the likelihood function). It is a sample-based version of the Fisher information.

## Definition

Suppose we observe random variables ${\displaystyle X_{1},\ldots ,X_{n}}$, independent and identically distributed with density f(X; θ), where θ is a (possibly unknown) vector. Then the log-likelihood of the parameters ${\displaystyle \theta }$ given the data ${\displaystyle X_{1},\ldots ,X_{n}}$ is

${\displaystyle \ell (\theta |X_{1},\ldots ,X_{n})={\frac {1}{n}}\sum _{i=1}^{n}\log f(X_{i}|\theta )}$.

We define the observed information matrix at ${\displaystyle \theta ^{*}}$ as

${\displaystyle {\mathcal {J}}(\theta ^{*})=-\left.\nabla \nabla ^{\top }\ell (\theta )\right|_{\theta =\theta ^{*}}}$
${\displaystyle =-\left.\left({\begin{array}{cccc}{\tfrac {\partial ^{2}}{\partial \theta _{1}^{2}}}&{\tfrac {\partial ^{2}}{\partial \theta _{1}\partial \theta _{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{1}\partial \theta _{p}}}\\{\tfrac {\partial ^{2}}{\partial \theta _{2}\partial \theta _{1}}}&{\tfrac {\partial ^{2}}{\partial \theta _{2}^{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{2}\partial \theta _{p}}}\\\vdots &\vdots &\ddots &\vdots \\{\tfrac {\partial ^{2}}{\partial \theta _{p}\partial \theta _{1}}}&{\tfrac {\partial ^{2}}{\partial \theta _{p}\partial \theta _{2}}}&\cdots &{\tfrac {\partial ^{2}}{\partial \theta _{p}^{2}}}\\\end{array}}\right)\ell (\theta )\right|_{\theta =\theta ^{*}}}$

In many instances, the observed information is evaluated at the maximum-likelihood estimate.[1]

## Fisher information

The Fisher information ${\displaystyle {\mathcal {I}}(\theta )}$ is the expected value of the observed information given a single observation ${\displaystyle X}$ distributed according to the hypothetical model with parameter ${\displaystyle \theta }$:

${\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} ({\mathcal {J}}(\theta ))}$.

## Applications

In a notable article, Bradley Efron and David V. Hinkley [2] argued that the observed information should be used in preference to the expected information when employing normal approximations for the distribution of maximum-likelihood estimates.