In molecular biology a pan-genome (or supra-genome) describes the full complement of genes in a species (typically applied to bacteria and archaea, which can have large variation in gene content among closely related strains). It is the union of the gene sets of all the strains of a species.[1] The significance of the pangenome arises in an evolutionary context, especially with relevance to metagenomics,[2] but is also used in a broader genomics context.[3]

The pan-genome includes the "core genome" containing genes present in all strains, a "dispensable genome" containing genes present in two or more strains, and finally "unique genes" specific to single strains.[1] Note that these distinctions are not strictly biological, since they depend partly on which strains are included in the analysis[citation needed].


An example for the latter can be seen in a comparison of the sizes of the Core and the Pan-Genome of Prochlorococcus. The core genome set is logically much smaller than the pan-genome, which is used by different ecotypes of Prochlorococcus.[4]

