DNA digital data storage
DNA digital data storage refers to any scheme to store digital data in the base sequence of DNA. This technology uses artificial DNA made using commercially available oligonucleotide synthesis machines for storage and DNA sequencing machines for retrieval. This type of storage system is more compact than current magnetic tape or hard drive storage systems due to the data density of the DNA. It also has the capability for longevity, as long as the DNA is held in cold, dry and dark conditions, as is shown by the study of woolly mammoth DNA from up to 60,000 years ago, and for resistance to obsolescence, as DNA is a universal and fundamental data storage mechanism in biology. These features have led to researchers involved in their development to call this method of data storage "apocalypse-proof" because "after a hypothetical global disaster, future generations might eventually find the stores and be able to read them."  It is, however, a slow process, as the DNA needs to be sequenced in order to retrieve the data, and so the method is intended for uses with a low access rate such as long-term archival of large amounts of scientific data.
The idea and the general considerations about the possibility of recording, storage and retrieval of information on DNA molecules were originally made by Mikhail Neiman and published in 1964–65 in the Radiotekhnika journal, USSR.
An improved system was reported in the journal Nature in January 2013, in an article lead by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues. Over five million bits of data, appearing as a speck of dust to researchers, and consisting of text files and audio files, were successfully stored and then perfectly retrieved and reproduced. Encoded information consisted of all 154 of Shakespeare's sonnets, a twenty-six-second audio clip of the "I Have a Dream" speech by Martin Luther King, the well known paper on the structure of DNA by James Watson and Francis Crick, a photograph of EBI headquarters in Hinxton, United Kingdom, and a file describing the methods behind converting the data. All the DNA files reproduced the information between 99.99% and 100% accuracy. The main innovations in this research were the use of an error-correcting encoding scheme to ensure the extremely low data-loss rate, as well as the idea of encoding the data in a series of overlapping short oligonucleotides identifiable through a sequence-based indexing scheme. Also, the sequences of the individual strands of DNA overlapped in such a way that each region of data was repeated four times to avoid errors. Two of these four strands were constructed backwards, also with the goal of eliminating errors. The per-megabyte costs were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage within about ten years.
The long-term stability of data encoded in DNA was reported in February 2015, in an article by researches from ETH Zurich. By adding redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry, the researchers predict error-free information recovery after up to 1 million years at -18 °C and 2'000 years if stored at 10 °C. By adding the possibility of being able to handle errors, the research team could reduce the cost of DNA synthesis down to ~$500/MB by choosing a more error-prone DNA synthesis method. In a news article in the New Scientist the team stated that if they are able to further decrease the cost they would store an archive version of wikipedia in DNA.
- Yong, E. (2013). "Synthetic double-helix faithfully stores Shakespeare's sonnets". Nature. doi:10.1038/nature.2013.12279.
- Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; Leproust, E. M.; Sipos, B.; Birney, E. (2013). "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA". Nature 494 (7435): 77–80. doi:10.1038/nature11875. PMC 3672958. PMID 23354052.
- Church, G. M.; Gao, Y.; Kosuri, S. (2012). "Next-Generation Digital Information Storage in DNA". Science 337 (6102): 1628. doi:10.1126/science.1226355. PMID 22903519.
- Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. (2015). "Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes". Angewandte Chemie International Edition 54 (8): 2552. doi:10.1002/anie.201411378.
- Edwards, Lin (August 17, 2012). "DNA used to encode a book and other digital information". Phys Org (Phys Org). Retrieved 2013-01-28.
- Mardis, E. R. (2008). "Next-Generation DNA Sequencing Methods". Annual Review of Genomics and Human Genetics 9: 387–402. doi:10.1146/annurev.genom.9.081307.164359. PMID 18576944.
- Cole, Adam (January 24, 2013). "Shall I Encode Thee In DNA? Sonnets Stored On Double Helix?" (Download article and audio is available). National Public Radio.
- Naik, Gautam (January 24, 2013). "Storing Digital Data in DNA". The Wall Street Journal (New York City: Dow Jones & Company). Retrieved 2012-01-25.
- Wall Street Journal article. "Storing Digital Data in DNA"
- Ewan Birney's Blog. "Using DNA as a digital archive media"
- Also see "The 10,000 year archive"
- Ed Yong's National Geographic blog. "Shakespeare’s Sonnets and MLK’s Speech Stored in DNA Speck"
- DNA Sequencing Caught in Deluge of Data. The New York Times (NYTimes.com).
- Aron, Jacob (February 15, 2015). "Glassed-in DNA makes the ultimate time capsule". New Scientist. Retrieved February 19, 2015.