The Codon Adaptation Index (CAI)[1] is the most widespread technique for analyzing Codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' (Nc), which measure deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes.

Rationale

Ideally, the reference set in CAI is composed of highly expressed genes, so that CAI provides an indication of gene expression level under the assumption that there is translational selection to optimize gene sequences according to their expression levels. The rationale for this is dual: highly expressed genes need to compete for resources (i.e. ribosomes) in fast-growing organisms and it makes sense for them to be also more accurately translated. Both hypotheses lead to highly expressed genes using mostly codons for tRNA species that are abundant in the cell.

Implementation

CAI is simply defined as the geometric mean of the weight associated to each codon over the length of the gene sequence (measured in codons).

${\displaystyle {\text{CAI}}=(\Pi _{i=1}^{L}w_{i})^{\frac {1}{L}}}$

For each amino acid, the weight of each of its codons, in CAI, is computed from the reference sequence set, as the ratio between the observed frequency of the codon fi and the frequency of the most frequent synonymous codon fj for that amino acid.

${\displaystyle w_{i}={\frac {f_{i}}{\max(f_{j})}}\qquad i,j\in [{\text{synonymous codons for amino acid}}]}$