# Otsu's method

In computer vision and image processing, Otsu's method, named after Nobuyuki Otsu (大津展之, Ōtsu Nobuyuki), is used to perform automatic image thresholding. In the simplest form, the algorithm returns a single intensity threshold that separate pixels into two classes, foreground and background. This threshold is determined by minimizing intra-class intensity variance, or equivalently, by maximizing inter-class variance. Otsu's method is a one-dimensional discrete analog of Fisher's Discriminant Analysis, is related to Jenks optimization method, and is equivalent to a globally optimal k-means performed on the intensity histogram. The extension to multi-level thresholding was described in the original paper, and computationally efficient implementations have since been proposed.

## Otsu's method

The algorithm exhaustively searches for the threshold that minimizes the intra-class variance, defined as a weighted sum of variances of the two classes:

$\sigma _{w}^{2}(t)=\omega _{0}(t)\sigma _{0}^{2}(t)+\omega _{1}(t)\sigma _{1}^{2}(t)$ Weights $\omega _{0}$ and $\omega _{1}$ are the probabilities of the two classes separated by a threshold $t$ ,and $\sigma _{0}^{2}$ and $\sigma _{1}^{2}$ are variances of these two classes.

The class probability $\omega _{0,1}(t)$ is computed from the $L$ bins of the histogram:

{\begin{aligned}\omega _{0}(t)&=\sum _{i=0}^{t-1}p(i)\\[4pt]\omega _{1}(t)&=\sum _{i=t}^{L-1}p(i)\end{aligned}} For 2 classes, minimizing the intra-class variance is equivalent to maximizing inter-class variance:

{\begin{aligned}\sigma _{b}^{2}(t)&=\sigma ^{2}-\sigma _{w}^{2}(t)=\omega _{0}(\mu _{0}-\mu _{T})^{2}+\omega _{1}(\mu _{1}-\mu _{T})^{2}\\&=\omega _{0}(t)\omega _{1}(t)\left[\mu _{0}(t)-\mu _{1}(t)\right]^{2}\end{aligned}} which is expressed in terms of class probabilities $\omega$ and class means $\mu$ , where the class means $\mu _{0}(t)$ , $\mu _{1}(t)$ and $\mu _{T}$ are:

{\begin{aligned}\mu _{0}(t)&={\frac {\sum _{i=0}^{t-1}ip(i)}{\omega _{0}(t)}}\\[4pt]\mu _{1}(t)&={\frac {\sum _{i=t}^{L-1}ip(i)}{\omega _{1}(t)}}\\\mu _{T}&=\sum _{i=0}^{L-1}ip(i)\end{aligned}} The following relations can be easily verified:

{\begin{aligned}\omega _{0}\mu _{0}+\omega _{1}\mu _{1}&=\mu _{T}\\\omega _{0}+\omega _{1}&=1\end{aligned}} The class probabilities and class means can be computed iteratively. This idea yields an effective algorithm.

### Algorithm

1. Compute histogram and probabilities of each intensity level
2. Set up initial $\omega _{i}(0)$ and $\mu _{i}(0)$ 3. Step through all possible thresholds $t=1,\ldots$ maximum intensity
1. Update $\omega _{i}$ and $\mu _{i}$ 2. Compute $\sigma _{b}^{2}(t)$ 4. Desired threshold corresponds to the maximum $\sigma _{b}^{2}(t)$ ### MATLAB or Octave implementation

histogramCounts is a 256-element histogram of a grayscale image different gray-levels (typical for 8-bit images). level is the threshold for the image (double).

function level = otsu(histogramCounts)
total = sum(histogramCounts); % total number of pixels in the image
%% OTSU automatic thresholding
top = 256;
sumB = 0;
wB = 0;
maximum = 0.0;
sum1 = dot(0:top-1, histogramCounts);
for ii = 1:top
wF = total - wB;
if wB > 0 && wF > 0
mF = (sum1 - sumB) / wF;
val = wB * wF * ((sumB / wB) - mF) * ((sumB / wB) - mF);
if ( val >= maximum )
level = ii;
maximum = val;
end
end
wB = wB + histogramCounts(ii);
sumB = sumB + (ii-1) * histogramCounts(ii);
end
end


Matlab has built-in functions graythresh() and multithresh() in the Image Processing Toolbox which are implemented with Otsu's method and Multi Otsu's method, respectively.

## Limitations

Otsu's method exhibits the relatively good performance if the histogram can be assumed to have bimodal distribution and assumed to possess a deep and sharp valley between two peaks. But if the object area is small compared with the background area, the histogram no longer exhibits bimodality. And if the variances of the object and the background intensities are large compared to the mean difference, or the image is severely corrupted by additive noise, the sharp valley of the gray level histogram is degraded. Then the possibly incorrect threshold determined by Otsu's method results in the segmentation error. (Here we define the object size to be the ratio of the object area to the entire image area and the mean difference to be the difference of the average intensities of the object and the background)

Empirical results show that the performance of global thresholding techniques used for object segmentation (including Otsu's method) are limited by small object size, the small mean difference between foreground and background pixels, large variances of the pixels that belong to the object and those that belong to the background, the large amount of noise, etc.

## Improvements

Various extensions have been developed to address limitations of Otsu's method. One popular extension is the two-dimensional Otsu's method, which performs better for the object segmentation task in noisy images. Here, the intensity value of a given pixel is compared with the average intensity of its immediate neighborhood to improve segmentation results.

At each pixel, the average gray-level value of the neighborhood is calculated. Let the gray level of the given pixel be divided into $L$ discrete values and the average gray level is also divided into the same $L$ values. Then a pair is formed: the pixel gray level and the average of the neighborhood $(i,j)$ . Each pair belongs to one of the $L\times L$ possible 2-dimensional bins. The total number of occurrences (frequency), $f_{ij}$ , of a pair $(i,j)$ , divided by the total number of pixels in the image $N$ , defines the joint probability mass function in a 2-dimensional histogram:

$P_{ij}={\frac {f_{ij}}{N}},\qquad \sum _{i=0}^{L-1}\sum _{j=0}^{L-1}P_{ij}=1$ And the 2-dimensional Otsu's method is developed based on the 2-dimensional histogram as follows.

The probabilities of two classes can be denoted as:

{\begin{aligned}\omega _{0}&=\sum _{i=0}^{s-1}\sum _{j=0}^{t-1}P_{ij}\\\omega _{1}&=\sum _{i=s}^{L-1}\sum _{j=t}^{L-1}P_{ij}\end{aligned}} The intensity mean value vectors of two classes and total mean vector can be expressed as follows:

{\begin{aligned}\mu _{0}&=[\mu _{0i},\mu _{0j}]^{T}=\left[\sum _{i=0}^{s-1}\sum _{j=0}^{t-1}i{\frac {P_{ij}}{\omega _{0}}},\sum _{i=0}^{s-1}\sum _{j=0}^{t-1}j{\frac {P_{ij}}{\omega _{0}}}\right]^{T}\\\mu _{1}&=[\mu _{1i},\mu _{1j}]^{T}=\left[\sum _{i=s}^{L-1}\sum _{j=t}^{L-1}i{\frac {P_{ij}}{\omega _{1}}},\sum _{i=s}^{L-1}\sum _{j=t}^{L-1}j{\frac {P_{ij}}{\omega _{1}}}\right]^{T}\\\mu _{T}&=[\mu _{Ti},\mu _{Tj}]^{T}=\left[\sum _{i=0}^{L-1}\sum _{j=0}^{L-1}iP_{ij},\sum _{i=0}^{L-1}\sum _{j=0}^{L-1}jP_{ij}\right]^{T}\end{aligned}} In most cases the probability off-diagonal will be negligible, so it is easy to verify:

$\omega _{0}+\omega _{1}\cong 1$ $\omega _{0}\mu _{0}+\omega _{1}\mu _{1}\cong \mu _{T}$ The inter-class discrete matrix is defined as

$S_{b}=\sum _{k=0}^{1}\omega _{k}[(\mu _{k}-\mu _{T})(\mu _{k}-\mu _{T})^{T}]$ The trace of the discrete matrix can be expressed as

{\begin{aligned}&\operatorname {tr} (S_{b})\\[4pt]={}&\omega _{0}[(\mu _{0i}-\mu _{Ti})^{2}+(\mu _{0j}-\mu _{Tj})^{2}]+\omega _{1}[(\mu _{1i}-\mu _{Ti})^{2}+(\mu _{1j}-\mu _{Tj})^{2}]\\[4pt]={}&{\frac {(\mu _{Ti}\omega _{0}-\mu _{i})^{2}+(\mu _{Tj}\omega _{0}-\mu _{j})^{2}}{\omega _{0}(1-\omega _{0})}}\end{aligned}} where

$\mu _{i}=\sum _{i=0}^{s-1}\sum _{j=0}^{t-1}iP_{ij}$ $\mu _{j}=\sum _{i=0}^{s-1}\sum _{j=0}^{t-1}jP_{ij}$ Similar to one-dimensional Otsu's method, the optimal threshold $(s,t)$ is obtained by maximizing $\operatorname {tr} (S_{b})$ .

### Algorithm

The $s$ and $t$ is obtained iteratively which is similar with one-dimensional Otsu's method. The values of $s$ and $t$ are changed till we obtain the maximum of $\operatorname {tr} (S_{b})$ , that is

max,s,t = 0;
for ss: 0 to L-1 do
for tt: 0 to L-1 do
evaluate tr(S_b);
if tr(S_b) > max
max = tr(S,b);
s = ss;
t = tt;
end if
end for
end for
return s,t;


Notice that for evaluating $\operatorname {tr} (S_{b})$ , we can use a fast recursive dynamic programming algorithm to improve time performance. However, even with the dynamic programming approach, 2d Otsu's method still has large time complexity. Therefore, much research has been done to reduce the computation cost.

If summed area tables are used to build the 3 tables, sum over $P_{ij}$ , sum over $i*P_{ij}$ , and sum over $j*P_{ij}$ , then the runtime complexity is the maximum of (O(N_pixels), O(N_bins*N_bins)). Note that if only coarse resolution is needed in terms of threshold, N_bins can be reduced.

### Matlab implementation

function inputs and output:

hists is a $256\times 256$ 2D-histogram of grayscale value and neighborhood average grayscale value pair.

total is the number of pairs in the given image.it is determined by the number of the bins of 2D-histogram at each direction.

threshold is the threshold obtained.

function threshold = otsu_2D(hists, total)
maximum = 0.0;
threshold = 0;
helperVec = 0:255;
mu_t0 = sum(sum(repmat(helperVec',1,256).*hists));
mu_t1 = sum(sum(repmat(helperVec,256,1).*hists));
p_0 = zeros(256);
mu_i = p_0;
mu_j = p_0;
for ii = 1:256
for jj = 1:256
if jj == 1
if ii == 1
p_0(1,1) = hists(1,1);
else
p_0(ii,1) = p_0(ii-1,1) + hists(ii,1);
mu_i(ii,1) = mu_i(ii-1,1)+(ii-1)*hists(ii,1);
mu_j(ii,1) = mu_j(ii-1,1);
end
else
p_0(ii,jj) = p_0(ii,jj-1)+p_0(ii-1,jj)-p_0(ii-1,jj-1)+hists(ii,jj);
mu_i(ii,jj) = mu_i(ii,jj-1)+mu_i(ii-1,jj)-mu_i(ii-1,jj-1)+(ii-1)*hists(ii,jj);
mu_j(ii,jj) = mu_j(ii,jj-1)+mu_j(ii-1,jj)-mu_j(ii-1,jj-1)+(jj-1)*hists(ii,jj);
end

if (p_0(ii,jj) == 0)
continue;
end
if (p_0(ii,jj) == total)
break;
end
tr = ((mu_i(ii,jj)-p_0(ii,jj)*mu_t0)^2 + (mu_j(ii,jj)-p_0(ii,jj)*mu_t1)^2)/(p_0(ii,jj)*(1-p_0(ii,jj)));

if ( tr >= maximum )
threshold = ii;
maximum = tr;
end
end
end
end