DNA digital data storage

From Wikipedia, the free encyclopedia
Jump to: navigation, search

DNA digital data storage refers to any scheme to store digital data in the base sequence of DNA. This technology uses artificial DNA made using commercially available oligonucleotide synthesis machines for storage and DNA sequencing machines for retrieval. This type of storage system is more compact than current magnetic tape or hard drive storage systems due to the data density of the DNA. It also has the capability for longevity, as long as the DNA is held in cold, dry and dark conditions, as is shown by the study of woolly mammoth DNA from up to 60,000 years ago, and for resistance to obsolescence, as DNA is a universal and fundamental data storage mechanism in biology. These features have led to researchers involved in their development to call this method of data storage "apocalypse-proof" because "after a hypothetical global disaster, future generations might eventually find the stores and be able to read them." [1] It is, however, a slow process, as the DNA needs to be sequenced in order to retrieve the data, and so the method is intended for uses with a low access rate such as long-term archival of large amounts of scientific data.[1][2]


The idea and the general considerations about the possibility of recording, storage and retrieval of information on DNA molecules were originally made by Mikhail Neiman and published in 1964–65 in the Radiotekhnika journal, USSR.[3]

On August 16, 2012, the journal Science published research by George Church and colleagues at Harvard University, in which DNA was encoded with digital information that included an HTML draft of a 53,400 word book written by the lead researcher, eleven JPG images and one JavaScript program. Multiple copies for redundancy were added and 5.5 petabits can be stored in each cubic millimeter of DNA.[4] The researchers used a simple code where bits were mapped one-to-one with bases, which had the shortcoming that it led to long runs of the same base, the sequencing of which is error-prone. This research result showed that besides its other functions, DNA can also be another type of storage medium such as hard drives and magnetic tapes.[1]

An improved system was reported in the journal Nature in January 2013, in an article lead by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues. Over five million bits of data, appearing as a speck of dust to researchers, and consisting of text files and audio files, were successfully stored and then perfectly retrieved and reproduced. Encoded information consisted of all 154 of Shakespeare's sonnets, a twenty-six-second audio clip of the "I Have a Dream" speech by Martin Luther King, the well known paper on the structure of DNA by James Watson and Francis Crick, a photograph of EBI headquarters in Hinxton, United Kingdom, and a file describing the methods behind converting the data. All the DNA files reproduced the information between 99.99% and 100% accuracy.[2] The main innovations in this research were the use of an error-correcting encoding scheme to ensure the extremely low data-loss rate, as well as the idea of encoding the data in a series of overlapping short oligonucleotides identifiable through a sequence-based indexing scheme.[1] Also, the sequences of the individual strands of DNA overlapped in such a way that each region of data was repeated four times to avoid errors. Two of these four strands were constructed backwards, also with the goal of eliminating errors.[2] The per-megabyte costs were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage within about ten years.[1]

The long-term stability of data encoded in DNA was reported in February 2015, in an article by researches from ETH Zurich. By adding redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry, the researchers predict error-free information recovery after up to 1 million years at -18 °C and 2'000 years if stored at 10 °C.[5] By adding the possibility of being able to handle errors, the research team could reduce the cost of DNA synthesis down to ~$500/MB by choosing a more error-prone DNA synthesis method. In a news article in the New Scientist the team stated that if they are able to further decrease the cost they would store an archive version of wikipedia in DNA.

See also[edit]


  1. ^ a b c d e Yong, E. (2013). "Synthetic double-helix faithfully stores Shakespeare's sonnets". Nature. doi:10.1038/nature.2013.12279.  edit
  2. ^ a b c Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; Leproust, E. M.; Sipos, B.; Birney, E. (2013). "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA". Nature 494 (7435): 77–80. doi:10.1038/nature11875. PMC 3672958. PMID 23354052.  edit
  3. ^ https://sites.google.com/site/msneiman1905/eng
  4. ^ Church, G. M.; Gao, Y.; Kosuri, S. (2012). "Next-Generation Digital Information Storage in DNA". Science 337 (6102): 1628. doi:10.1126/science.1226355. PMID 22903519.  edit
  5. ^ Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. (2015). "Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes". Angewandte Chemie International Edition 54 (8): 2552. doi:10.1002/anie.201411378.  edit

Further reading[edit]