Gain (information retrieval)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses, see Gain (disambiguation).

The gain, also called improvement over random[citation needed] can be specified for a classifier and is an important measure[dubious ] to describe the performance of it.


In the following a random classifier is defined such that it randomly predicts the same amount of either class.

The gain is defined as described in the following:

Gain in Precision[edit]

The random precision of a classifier is defined as

r = \frac{TP+FN}{TP+TN+FP+FN} = \frac{\textit{Positives}}{N}

where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives respectively, positives is the number of positive instances in the target dataset and N is the size of the dataset.

The random precision defines the lowest baseline of a classifier.

And Gain is defined as

G = \frac{\textit{precision}}{r}

which gives a factor by which a classifier is better when compared to its random counterpart. A Gain of 1 would indicate a classifier that is not better than random. The larger the gain, the better.

Gain in Overall Accuracy[edit]

The accuracy of a classifier in general is defined as

Acc = \frac{TP+TN}{TP+TN+FP+FN} = \frac{\textit{Corrects}}{N}

Here, the random accuracy of a classifier can be defined as

r = \left ( \frac{\textit{Positives}}{N} \right ) ^2+ \left ( \frac{\textit{Negatives}}{N} \right ) ^2=f(\textit{Positives})^2 + f(\textit{Negatives})^2

f(Positives) and f(Negatives) is the fraction of positive and negative classes in the dataset.

And again gain is

G = \frac{\textit{Acc}}{r}

This time the gain is measured not only with respect to the prediction of a so-called positive class, but with respect to the overall classifier ability to distinguish the two equally important classes.


In Bioinformatics as an example, the gain is measured for methods that predict residue contacts in proteins.

See also[edit]