Velvet assembler

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Velvet is a set of algorithms manipulating de Bruijn graphs for genomic and de novo transcriptomic Sequence assembly.[1][2][3] It was designed for short read sequencing technologies, such as Solexa or 454 Sequencing and was developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute. The tool takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. It has also been implemented inside of commercial packages, such as the Geneious Server.

Contents

Algorithm [edit]

The de Bruijn graph [edit]

For each k-mer observed (and its reverse complement) in the set of reads, the hash table records the ID of the first read encountered containing that k-mer and the position of its occurrence within that read.

A second database is created with the opposite information: short read > original k-mers are overlapped by subsequent reads.

Simplification [edit]

Whenever a node A has only one outgoing arc that points to another node B that has only one ingoing arc, the two nodes are merged.

Error removal [edit]

Errors can be due to both the sequencing process or to the polymorphisms.

  • Removing the "tips": a chain of nodes that is disconnected on one end.
  • Removing bubbles with the Tour Bus algorithm
  • Removing erroneous connections

References [edit]