Jump to content

2 base encoding

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Hashtpa9 (talk | contribs) at 00:19, 28 February 2008. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Introduction

The dream of human whole genome re-sequencing at a reasonable time and cost (less than $1000) is becoming realized with recently developed next-generation sequencing technologies. These technologies generating hundreds of thousands of small sequence reads at one time include mainly the 454 pyrosequencing (introduced 2005), Solexa system (introduced 2006) and 2-base encoding sequencing (introduced 2007-2008). These methods have reduced the cost from almost $0.01/base in 2004 to near $0.0001/base in 2006 and increased the sequencing machine capacity from 1,000,000 base/machine/day in 2004 to more than 100,000,000 base/machine/day in 2006. The general steps in all of these next-generation sequencing techniques include:

1- Random fragmentation of genomic DNA

2- Immobilization of single DNA fragments on a solid support like a bead or planar solid surface

3- Amplification of DNA fragments on the solid surface using PCR and making polymerase colonies

4- Sequencing and subsequent in situ interrogation after each cycle using fluorescence scanning or chemiluminescence [1].


In 2005 Shendure et al. used a sequencing procedure using multiple cycles of ligation of fluorescent labled 9-mer probes which distinguish the central base. In each cycle the sequence of every fifth base is recognized. This process is repeated using different primers to sequence the remaining four bases in each gap [2]. The most recent next-generation sequencing technology which is called 2-base encoding or SOLiD (Sequencing by Oligonucleotide Ligation and Detection) technology has been developed by Applied Biosystem and will be commercially available in 2008. Similar to Shendure et al. and despite other two next-generation sequencing technologies, 2-base encoding is based on ligation sequencing rather than sequencing by synthesis. However, its fundamental difference to previously used 9-mer probes with distinguished central base is taking advantage of fluorescent labeled 8-mer probes with distinguished the 2 central bases.

This review will take a brief overview on the 2-base encoding technology and will compare it to the other next-generation sequencing technologies.

How it works

The SOLiD Sequencing System uses probes with dual base encoding.

File:Database-2 base encoding.png
Figure 1- Schematic feature of four different probes used in 2 base encoding.
File:2 Base Encoding.png
Figure 2- Schematic feature of how 2-base encoding system works. Each base pair in the sequence is read twice and enables this system to minimizes the rate of errors.

For decoding the colors we must first know that each single color indicates two bases and second, we need to know one of the bases in the sequence[3].



Advantages

-Accuracy: each base in this sequencing method is interrogated and interpreted twice. This can increase the accuracy of the system to more than 99.94% which is higher than other systems.

-Detection of SNPs and other small changes: One of the main advantages of this system is its ease in detection of single nucleotide polymorphisms (SNPs) as well as other small alterations in the template sequence. Changing the color of two adjacent bases is characteristic for SNPs. The detection of other alterations is summarized in Figure 2.

- Detection of errors: as discussed earlier, each base in this system is recognized by two colors and alteration in any single base will result in change in two colors. Therefore, while alteration in two or more than two colors demonstrate a real change in our sequence, just one color change indicates an error and should not be considered as a change (Figure 2).


Limitations

Conclusion

Next-generation

sequencing method

Number of

bp/run

Duration of

each run

Read length

(bp)

Cost Accuracy
454 100 million 7.5 hours up to 250 1:10-20 of

Sanger sequencing

> 99.5%
Solexa (1G) 1000 million 3 days up to 50 1:20-30 of

454 sequencing

> 99.93%
2 base encoding 2000 million 4 days up to 35 1:20-30 of

454 sequencing

> 99.94%


References

  1. ^ http://www.blackwell-synergy.com/doi/full/10.1111/j.1471-8286.2007.02019.x?cookieSet=1 Sequencing breakthroughs for genomic ecology and evolutionary biology
  2. ^ http://www.sciencemag.org/cgi/content/full/309/5741/1728 Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome
  3. ^ http://seqanswers.com/forums/showthread.php?t=10