Jump to content

Representative sequences

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Citation bot (talk | contribs) at 00:26, 28 November 2018 (Add: pmc, pmid, pages. Removed parameters. You can use this bot yourself. Report bugs here. | User-activated; Category:Bioinformatics.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Representative sequences are short regions within protein sequences that can be used to approximate the evolutionary relationships of those proteins, or the organisms from which they come. Representative sequences are contiguous subsequences (typically 300 residues) from ubiquitous, conserved proteins, such that each orthologous family of representative sequences taken alone gives a distance matrix in close agreement with the consensus matrix.[1]

Use

Protein sequences can provide data about the biological function and evolution of proteins and protein domains. Grouping and interrelating protein sequences can therefore provide information about both human biological processes, and the evolutionary development of biological processes on earth; such sequence clusters allow for the effective coverage of sequence space. Sequence clusters can reduce a large database of sequences to a smaller set of sequence representatives, each of which should represent its cluster at the sequence level. Sequence representatives allow the effective coverage of the original database with fewer sequences. The database of sequence representatives is called non-redundant, as similar (or redundant) sequences have been removed at a certain similarity threshold.

References

  1. ^ Bern, Marshall; Goldberg, David (November 2, 2004). "Automatic selection of representative proteins for bacterial phylogeny". BMC Evolutionary Biology. 5 (34): 34. doi:10.1186/1471-2148-5-34. PMC 1175084. PMID 15927057.{{cite journal}}: CS1 maint: unflagged free DOI (link)