UniFrac is a method to calculate a distance measure between organismal communities using phylogenetic information, and is widely used in metagenomics. The method was devised by Catherine Lozupone and Rob Knight of the University of Colorado at Boulder in 2005.
The distance is calculated between pairs of samples (each sample represents a organismal community). All taxa found in one or both samples are placed on a phylogenetic tree. A branch leading to taxa from both samples is marked as "shared" and branches leading to taxa which appears only in one sample are marked as "unshared". The distance between the two samples is then calculated as (the sum of "unshared" branch lengths)/(the sum of all tree branch lengths (= shared+unshared)), i.e. the fraction of total branch length which is unshared. This definition satisfies the requirements of a distance metric, being non-negative, zero only when entities are identical, transitive, and conformant to the triangle inequality.
If there are several different samples, a distance matrix can be created, by making a tree for each pair of samples and calculating their UniFrac measure. Later, standard multivariate statistical, methods such as data clustering and principal co-ordinates analysis can be used.
One can determine the statistical significance of the Unifrac distance between two samples using Monte Carlo simulations. By randomizing the sample classification of each taxa on the tree (leaving the branch structure unchanged) and creating a distribution of UniFrac distance values, one can obtain a distribution of UniFrac values. From this, a p-value can be given to the actual distance between the samples.
Additionally, there is a weighted version of the UniFrac metric which accounts for the relative abundance of each of the taxa within the communities. This is commonly used in metagenomic studies, where the number of metagenomic reads can be in the tens of thousands, and it is appropriate to 'bin' these reads into operational taxonomic units, or OTUs, which can then be dealt with as taxa within the UniFrac framework.
Recently, a generalized UniFrac version, which unifies the weighted and unweighted UniFrac distance in a single framework, was proposed. The weighted and unweighted UniFrac distance place too much weight on either abundant lineages or rare lineages. Their power to detect environmental influence is limited under some setting, where the moderately abundant lineages are mostly affected. The generalized UniFrac distance corrects the limitation of the weighted/weighted UniFrac distance by down-weighting their emphasis on either abundant or rare lineages.
- Lozupone, C.; Knight, R. (2005). "UniFrac: A New Phylogenetic Method for Comparing Microbial Communities". Applied and Environmental Microbiology 71 (12): 8228–8235. doi:10.1128/AEM.71.12.8228-8235.2005. PMC 1317376. PMID 16332807.
- Chen, J.; Bittinger, K.; Charlson, E. S.; Hoffmann, C.; Lewis, J.; Wu, G. D.; Collman, R. G.; Bushman, F. D.; Li, H. (2012). "Associating microbiome composition with environmental covariates using generalized UniFrac distances". Bioinformatics 28 (16): 2106–2113. doi:10.1093/bioinformatics/bts342. PMC 3413390. PMID 22711789.