Repeated median regression

In robust statistics, repeated median regression, also known as the repeated median estimator, is a robust linear regression algorithm.

The estimator has a breakdown point of 50%.[1] Although it is equivariant under scaling, or under linear transformations of either its explanatory variable or its response variable, it is not under affine transformations that combine both variables.[1] It can be calculated in ${\displaystyle O(n^{2})}$ time by brute force, in ${\displaystyle O(n\log ^{2}n)}$ time using more sophisticated techniques,[2] or in ${\displaystyle O(n\log n)}$ randomized expected time.[3] It may also be calculated using an on-line algorithm with ${\displaystyle O(n)}$ update time.[4]

Method

The repeated median method estimates the slope of the regression line ${\displaystyle y=A+Bx}$ for a set of points ${\displaystyle (X_{i},Y_{i})}$ as

${\displaystyle {\widehat {B}}={\underset {i}{\operatorname {median} }}\ {\underset {j\,\neq \,i}{\operatorname {median} }}\ \operatorname {slope} (i,j)}$

where ${\displaystyle \operatorname {slope} (i,j)}$ is defined as ${\displaystyle (Y_{j}-Y_{i})/(X_{j}-X_{i})}$.[5]

The estimated Y-axis intercept is defined as

${\displaystyle {\widehat {A}}={\underset {i}{\operatorname {median} }}\ {\underset {j\,\neq \,i}{\operatorname {median} }}\ \operatorname {intercept} (i,j)}$

where ${\displaystyle \operatorname {intercept} (i,j)}$ is defined as ${\displaystyle (X_{j}Y_{i}-X_{i}Y_{j})/(X_{j}-X_{i})}$.[5]