Sequence graph

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

In comparative genomics, a sequence graph, also called an alignment graph, breakpoint graph, or adjacency graph, is a bidirected graph in which the vertices represent segments of DNA and the edges represent adjacency between segments in a genome.[1] The segments are labeled by the DNA string they represent, and each edge connects the tail end of one segment with the head end of another segment. Each adjacency edge is labelled by a (possibly empty) string of DNA. Traversing a connected component of segments and adjacency edges (called a thread) yields a sequence, which typically represents a genome or a section of a genome. The segments can be thought of as synteny blocks, with the edges dictating how to arrange these blocks in a particular genome, and the labelling of the adjacency edges representing bases that are not contained in synteny blocks.


Multiple sequence alignment[edit]

Sequence graphs can be used to represent multiple sequence alignments with the addition of a new kind of edge representing homology between segments.[2] For a set of genomes, one can create an acyclic breakpoint graph with a thread for each genome. For two segments and , where ,,, and represent the endpoints of the two segments, homology edges can be created from to and to or from to and to - representing the two possible orientations of the homology. The advantage of representing a multiple sequence alignment this way is that it is possible to include inversions and other structural rearrangements that wouldn't be allowable in a matrix representation.

Representing variation[edit]

If there are multiple possible paths when traversing a thread in a sequence graph, multiple sequences can be represented by the same thread. This means it is possible to create a sequence graph that represents a population of individuals with slightly different genomes - with each genome corresponding to one path through the graph. These graphs have been proposed as a replacement for the reference human genome.[3]


  1. ^ Alekseyev, M. A.; Pevzner, P. A. (2009-02-13). "Breakpoint graphs and ancestral genome reconstructions". Genome Research. Cold Spring Harbor Laboratory. 19 (5): 943–957. doi:10.1101/gr.082784.108. ISSN 1088-9051. PMC 2675983. PMID 19218533.CS1 maint: ref=harv (link)
  2. ^ Paten, Benedict; Zerbino, Daniel R; Hickey, Glenn; Haussler, David (2014-06-19). "A unifying model of genome evolution under parsimony". BMC Bioinformatics. Springer Science and Business Media LLC. 15 (1): 206. doi:10.1186/1471-2105-15-206. ISSN 1471-2105. PMC 4082375. PMID 24946830.CS1 maint: ref=harv (link)
  3. ^ Paten, Benedict; Novak, Adam; Haussler, David (2014-04-20). "Mapping to a Reference Genome Structure". arXiv:1404.5010 [q-bio.GN].CS1 maint: ref=harv (link)