This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
Protein sequences can provide data about the biological function and evolution of proteins and protein domains. Grouping and interrelating protein sequences can therefore provide information about both human biological processes, and the evolutionary development of biological processes on earth.
Such Sequence clusters allow the effective coverage of sequence space.
Sequence clusters can reduce a large database of sequences to a smaller set of sequence representatives, each of which should represent its cluster at the sequence level.
Sequence representatives allow the effective coverage of the original database with fewer sequences. The database of sequence representatives is called non-redundant, as similar (or redundant) sequences have been removed at a certain similarity threshold.
|This biology article is a stub. You can help Wikipedia by expanding it.|
|This bioinformatics-related article is a stub. You can help Wikipedia by expanding it.|