|Research center||University of California Santa Cruz|
|Laboratory||Center for Biomolecular Science and Engineering|
|Authors||Brian J Raney|
|Primary citation||PMID 21037257|
The Encyclopedia of DNA Elements (ENCODE) is a public research consortium launched by the US National Human Genome Research Institute (NHGRI) in September 2003. The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed the successful Human Genome Project. All data generated in the course of the project will be released rapidly[clarification needed] into public databases.
On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the journals Nature (6 publications), Genome Biology (18 papers), and Genome Research (6 papers). These publications combine to show that approximately 20% of noncoding DNA in the human genome is functional while an additional 60% is transcribed with no known function. Much of this functional non-coding DNA is involved in the regulation of the expression of coding genes. Furthermore the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than was previously believed.
Genome-wide association studies have determined that approximately 90% of single-letter differences in sequences that are associated with various diseases fall outside of protein coding regions. Previously it was not clear how these sequence differences could influence disease however new gene regulatory sites discovered by the ENCODE project in many cases provide an explanation.
The human genome consists of just over 3 billion DNA base pairs. The Human Genome Project, completed in 2003, sequenced the entire genome for one specific person. In the years since then, the genomes of many other individual people have been sequenced, partly under the auspices of the 1000 Genomes Project. Sequencing a genome, however, produces several gigabytes of raw data but does not directly say anything about how it works. The aim of the ENCODE project is to determine which parts of the DNA are biologically active, and make an initial assessment of their functions.
The part of the DNA that has long been best understood is the exome, consisting of around 20,000 protein-coding genes. These genes, however, make up in total only around 1.5% of the DNA, and are separated from each other by long stretches of DNA that does not code for proteins. This remaining DNA includes the so-called regulome, which comprises a variety of DNA elements that in one way or another modulate the expression of protein-coding genes. It has not been clear, though, how much of the total DNA is comprised within the regulome. Until recently, the majority view has been that much of the DNA is "junk"—DNA that is never transcribed and has no biological function. The central goal of the ENCODE project is to map out the regulome, by determining which parts of the DNA belong to it and the mechanisms by which those parts influence gene transcription.
Pilot phase 
The project was initiated with a $12 million pilot phase to evaluate a variety of different methods for use in later stages. A number of then-existing techniques were used to analyse a portion of the genome equal to about 1% (30 million base-pairs). The results of these analyses were evaluated based on their ability to identify regions of DNA which were known or suspected to contain functional elements. 50% of the sample area selected for study under this phase was manually selected whilst the other 50% was selected at random. The manually selected regions were selected based on the presence of well studied genes and the availability of comparative data. Methods evaluated included chromatin immunoprecipitation (ChIP) and quantitative PCR.
The ENCODE pilot project rapidly[clarification needed] released all of its data into public databases. The pilot phase was successfully finished and the results were published in June 2007 in Nature and in a special issue of Genome Research.
Production phase 
In September 2007, NHGRI began funding the production phase of the ENCODE project. In this phase, the goal was to analyze the entire genome and to conduct "additional pilot-scale studies".
As in the pilot project, the production effort is organized as an open consortium. In October 2007, NHGRI awarded grants totaling more than $80 million over four years. The production phase also includes a Data Coordination Center, a Data Analysis Center, and a Technology Development Effort.
By 2010, over 1,000 genome-wide data sets had been produced by the ENCODE project. Taken together, these data sets show which regions are transcribed into RNA, which regions are likely to control the genes that are used in a particular type of cell, and which regions are associated with a wide variety of proteins. The primary assays used in ENCODE are ChIP-seq, DNase I Hypersensitivity, RNA-seq, and assays of DNA methylation.
In September 2012, the project released a much more extensive set of results, in 30 papers published simultaneously in several journals, including six in Nature, six in Genome Biology and a special issue with 18 publications of Genome Research. The most striking finding was that the fraction of human DNA that is biologically active is considerably higher than even the most optimistic previous estimates. In an overview paper, the ENCODE Consortium reported that its members were able to assign biochemical functions to over 80% of the genome. Much of this was found to be involved in controlling the expression levels of coding DNA, which makes up less than 1% of the genome.
The most important new elements of the "encyclopedia" include:
- A comprehensive map of DNase 1 hypersensitive sites, which are markers for regulatory DNA that is typically located adjacent to genes and allows chemical factors to influence their expression. The map identified nearly 3 million sites of this type, including nearly all that were previously known and many that are novel.
- A lexicon of short DNA sequences that form recognition motifs for DNA-binding proteins. Approximately 8.4 million such sequences were found, comprising a fraction of the total DNA roughly twice the size of the exome. Thousands of transcription promoters were found to make use of a single stereotyped 50-base-pair footprint.
- A preliminary sketch of the architecture of the network of human transcription factors, that is, factors that bind to DNA in order to promote or inhibit the expression of genes. The network was found to be quite complex, with factors that operate at different levels as well as numerous feedback loops of various types.
- A measurement of the fraction of the human genome that is capable of being transcribed into RNA. This fraction was estimated to add up to more than 75% of the total DNA, a much higher value than previous estimates. The project also began to characterize the types of RNA transcripts that are generated at various locations.
modENCODE project 
The Model Organism ENCyclopedia Of DNA Elements (modENCODE) project is a continuation of the original ENCODE project targeting the identification of functional elements in selected model organism genomes, specifically, Drosophila melanogaster and Caenorhabditis elegans. The extension to model organisms permits biological validation of the computational and experimental findings of the ENCODE project, something that is difficult or impossible to do in humans.
In late 2010, the modENCODE consortium unveiled its first set of results with publications on annotation and integrative analysis of the worm and fly genomes in Science. Data from these publications is available from the modENCODE web site.
See also 
- Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS, Kirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D, Haussler D, Kent, WJ (January 2011). "ENCODE whole-genome data in the UCSC genome browser (2011 update)". Nucleic Acids Res. 39 (Database issue): D871–5. doi:10.1093/nar/gkq1017. PMC 3013645. PMID 21037257.
- Maher B (2012). "ENCODE: The human encyclopaedia". Nature 489 (7414): 46–48. doi:10.1038/489046a.
- ENCODE Project Consortium (2011). "A User's Guide to the Encyclopedia of DNA Elements (ENCODE)". In Becker PB. PLoS Biology 9 (4): e1001046. doi:10.1371/journal.pbio.1001046. PMC 3079585. PMID 21526222.
- ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, et al. (2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project". Nature 447 (7146): 799–816. Bibcode:2007Natur.447..799B. doi:10.1038/nature05874. PMC 2212820. PMID 17571346.
- Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG (2006). "EGASP: The human ENCODE Genome Annotation Assessment Project". Genome Biology 7: S2. doi:10.1186/gb-2006-7-s1-s2. PMC 1810551. PMID 16925836.
- "ENCODE project at UCSC". ENCODE Consortium. Retrieved 2012-09-05.
- Walsh F (2012-09-05). "Detailed map of genome function". BBC News. Archived from the original on 2012-09-06. Retrieved 2012-09-06.
- Timmer J (2012-09-10). "Most of what you read was wrong: how press releases rewrote scientific history". Staff / From the Minds of Ars. Ars Technica. Retrieved 2012-09-10.
- Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M (September 2012). "An integrated encyclopedia of DNA elements in the human genome". Nature 489 (7414): 57–74. doi:10.1038/nature11247. PMID 22955616.
- Pennisi E (September 2012). "Genomics. ENCODE project writes eulogy for junk DNA". Science 337 (6099): 1159, 1161. doi:10.1126/science.337.6099.1159. PMID 22955811.
- Saey, Tina Hesman (6 October 2012). "Team releases sequel to the human genome". Society for Science & the Public. Retrieved 18 October 2012. Text " Genes & Cells " ignored (help); Text " Science News " ignored (help)
- "ENCODE Pilot Project: Target Selection". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
- "ENCODE Project at UCSC". University of California at Santa Cruz. Retrieved 2011-08-05.
- Weinstock GM (2007). "ENCODE: More genomic empowerment". Genome Research 17 (6): 667–668. doi:10.1101/gr.6534207. PMID 17567987.
- "Genome.gov | ENCODE and modENCODE Projects". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
- "National Human Genome Research Institute - Organization". The NIH Almanac. United States National Institutes of Health. Retrieved 2011-08-05.
- "Genome.gov | ENCODE Participants and Projects". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
- Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E (September 2012). "Genomics: ENCODE explained". Nature 489 (7414): 52–5. doi:10.1038/489052a. PMID 22955614.
- Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, et al. (September 2012). "The accessible chromatin landscape of the human genome". Nature 489 (7414): 75–82. doi:10.1038/nature11232. PMID 22955617.
- Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, et al. (September 2012). "An expansive human regulatory lexicon encoded in transcription factor footprints". Nature 489 (7414): 83–90. doi:10.1038/nature11212. PMID 22955618.
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, et al. (September 2012). "Architecture of the human regulatory network derived from ENCODE data". Nature 489 (7414): 91–100. doi:10.1038/nature11245. PMID 22955619.
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, et al. (September 2012). "Landscape of transcription in human cells". Nature 489 (7414): 101–8. doi:10.1038/nature11233. PMID 22955620.
- "The modENCODE Project: Model Organism ENCyclopedia Of DNA Elements (modENCODE)". NHGRI website. Retrieved 2008-11-13.
- "modENCODE Participants and Projects". NHGRI website. Retrieved 2008-11-13.
- "Berkeley Lab Life Sciences Awarded NIH Grants for Fruit Fly, Nematode Studies". Lawrence Berkeley National Laboratory website. 2007-05-14. Retrieved 2008-11-13.
- Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, et al. (2010). "Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project". Science 330 (6012): 1775–1787. Bibcode:2010Sci...330.1775G. doi:10.1126/science.1196914. PMID 21177976.
- modENCODE Consortium, Roy S, Ernst J, Kharchenko PV, Kheradpour P, Negre N, Eaton ML, Landolin JM, Bristow CA, Ma L, et al. (2010). "Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE". Science 330 (6012): 1787–1797. Bibcode:2010Sci...330.1787R. doi:10.1126/science.1198374. PMID 21177974.
- "modENCODE". The National Human Genome Research Institute.