# Information gain ratio

In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information. It was proposed by Ross Quinlan, to reduce a bias towards multi-valued attributes by taking the number and size of branches into account when choosing an attribute.

Information Gain is also known as Mutual Information.

## Information gain calculation

Let $Attr$ be the set of all attributes and $Ex$ the set of all training examples, $value(x,a)$ with $x\in Ex$ defines the value of a specific example $x$ for attribute $a\in Attr$ , $H$ specifies the entropy. The ${\textstyle values(a)}$ function denotes the set of all possible values of attribute ${\textstyle a\in Attr}$ . The information gain for an attribute $a\in Attr$ is defined as follows:

$IG(Ex,a)=H(Ex)-\sum _{v\in values(a)}\left({\frac {|\{x\in Ex|value(x,a)=v\}|}{|Ex|}}\cdot H(\{x\in Ex|value(x,a)=v\})\right)$ The information gain is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case the relative entropies subtracted from the total entropy are 0.

## Intrinsic value calculation

The intrinsic value for a test is defined as follows:

$IV(Ex,a)=-\sum _{v\in values(a)}{\frac {|\{x\in Ex|value(x,a)=v\}|}{|Ex|}}\cdot \log _{2}\left({\frac {|\{x\in Ex|value(x,a)=v\}|}{|Ex|}}\right)$ ## Information gain ratio calculation

The information gain ratio is just the ratio between the information gain and the intrinsic value: $IGR(Ex,a)=IG/IV$ 