Jump to content

ENCODE: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
ce
replaced misleading "80% biologically active" figure with "20% functional + 60% transcribed" with is more accurate (see Talk:ENCODE#We_need_a_radical_change)
Line 28: Line 28:
The '''Encyclopedia of DNA Elements''' ('''ENCODE''') is a public research consortium<ref name = "Maher_2012">{{cite journal | title = ENCODE: The human encyclopaedia | journal = Nature | author = Maher B | year = 2012 | volume = 489 | issue = 7414 | pages =46–48 | doi = 10.1038/489046a }}</ref> launched by the [[USA|US]] [[National Human Genome Research Institute]] (NHGRI) in September 2003.<ref name="Raney2011">{{cite journal | author = Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS, Kirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D, Haussler D, [[Jim Kent|Kent, WJ]] | title = ENCODE whole-genome data in the UCSC genome browser (2011 update) | journal = [[Nucleic Acids Research|Nucleic Acids Res.]] | volume = 39 | issue = Database issue | pages = D871–5 | year = 2011 | month = January | pmid = 21037257 | pmc = 3013645 | doi = 10.1093/nar/gkq1017 }}</ref><ref>{{cite doi|10.1371/journal.pbio.1001046}}</ref><ref name="birney">{{cite pmid|17571346}}</ref><ref>{{cite pmid|16925836}}</ref> The goal is to find all functional elements in the human [[genome]], one of the most critical projects by NHGRI after it completed the successful [[Human Genome Project]]. All data generated in the course of the project will be released rapidly into public databases.
The '''Encyclopedia of DNA Elements''' ('''ENCODE''') is a public research consortium<ref name = "Maher_2012">{{cite journal | title = ENCODE: The human encyclopaedia | journal = Nature | author = Maher B | year = 2012 | volume = 489 | issue = 7414 | pages =46–48 | doi = 10.1038/489046a }}</ref> launched by the [[USA|US]] [[National Human Genome Research Institute]] (NHGRI) in September 2003.<ref name="Raney2011">{{cite journal | author = Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS, Kirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D, Haussler D, [[Jim Kent|Kent, WJ]] | title = ENCODE whole-genome data in the UCSC genome browser (2011 update) | journal = [[Nucleic Acids Research|Nucleic Acids Res.]] | volume = 39 | issue = Database issue | pages = D871–5 | year = 2011 | month = January | pmid = 21037257 | pmc = 3013645 | doi = 10.1093/nar/gkq1017 }}</ref><ref>{{cite doi|10.1371/journal.pbio.1001046}}</ref><ref name="birney">{{cite pmid|17571346}}</ref><ref>{{cite pmid|16925836}}</ref> The goal is to find all functional elements in the human [[genome]], one of the most critical projects by NHGRI after it completed the successful [[Human Genome Project]]. All data generated in the course of the project will be released rapidly into public databases.


On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the journals ''[[Nature (journal)|Nature]]'', ''[[Science (journal)|Science]]'', ''[[Genome Biology]]'', and ''[[Genome Research]]''.<ref>{{cite web|url=http://genome.ucsc.edu/ENCODE/|title=ENCODE project at UCSC|publisher=ENCODE Consortium|accessdate=2012-09-05}}</ref><ref>{{cite news | title = Detailed map of genome function | author = Walsh F | authorlink = Fergus Walsh | url = http://www.bbc.co.uk/news/health-19202141 | publisher = BBC News | date = 2012-09-05 | accessdate = 2012-09-06 | archiveurl = http://www.webcitation.org/6ASmAMpP9 | archivedate = 2012-09-06 }}</ref> These publications combine to show that at least 80% of [[noncoding DNA]] in the human genome is biologically active, rather than being merely "junk" as once believed. A significant fraction (~10%) of this biologically active non-coding DNA is involved in the [[gene regulation|regulation]] of the [[gene expression|expression]] of coding [[gene]]s.<ref name="pmid22955616"/> Furthermore the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than previously believed.<ref name="pmid22955811">{{cite journal | author = Pennisi E | title = Genomics. ENCODE project writes eulogy for junk DNA | journal = Science | volume = 337 | issue = 6099 | pages = 1159, 1161 | year = 2012 | month = September | pmid = 22955811 | doi = 10.1126/science.337.6099.1159 }}</ref>
On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the journals ''[[Nature (journal)|Nature]]'', ''[[Science (journal)|Science]]'', ''[[Genome Biology]]'', and ''[[Genome Research]]''.<ref>{{cite web|url=http://genome.ucsc.edu/ENCODE/|title=ENCODE project at UCSC|publisher=ENCODE Consortium|accessdate=2012-09-05}}</ref><ref>{{cite news | title = Detailed map of genome function | author = Walsh F | authorlink = Fergus Walsh | url = http://www.bbc.co.uk/news/health-19202141 | publisher = BBC News | date = 2012-09-05 | accessdate = 2012-09-06 | archiveurl = http://www.webcitation.org/6ASmAMpP9 | archivedate = 2012-09-06 }}</ref> These publications combine to show that approximately 20% of [[noncoding DNA]] in the human genome is functional while an additional 60% is transcribed with no known function. Much of this functional non-coding DNA is involved in the [[gene regulation|regulation]] of the [[gene expression|expression]] of coding [[gene]]s.<ref name="pmid22955616"/> Furthermore the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than previously believed.<ref name="pmid22955811">{{cite journal | author = Pennisi E | title = Genomics. ENCODE project writes eulogy for junk DNA | journal = Science | volume = 337 | issue = 6099 | pages = 1159, 1161 | year = 2012 | month = September | pmid = 22955811 | doi = 10.1126/science.337.6099.1159 }}</ref>


[[Genome-wide association studies]] have determined that approximately 90% of [[single-nucleotide polymorphisms|single-letter differences]] in sequences that are associated with various diseases fall outside of protein coding regions. Previously it was not clear how these sequence differences could influence disease however new gene regulatory sites discovered by the ENCODE project in many cases provide an explanation.<ref name = "Maher_2012"/>
[[Genome-wide association studies]] have determined that approximately 90% of [[single-nucleotide polymorphisms|single-letter differences]] in sequences that are associated with various diseases fall outside of protein coding regions. Previously it was not clear how these sequence differences could influence disease however new gene regulatory sites discovered by the ENCODE project in many cases provide an explanation.<ref name = "Maher_2012"/>

Revision as of 01:51, 11 September 2012

ENCODE
Content
Descriptionwhole-genome data
Contact
Research centerUniversity of California Santa Cruz
LaboratoryCenter for Biomolecular Science and Engineering
AuthorsBrian J Raney[1]
Primary citationPMID 21037257
Release date2010 (2010)
Access
Websiteencodeproject.org

The Encyclopedia of DNA Elements (ENCODE) is a public research consortium[2] launched by the US National Human Genome Research Institute (NHGRI) in September 2003.[1][3][4][5] The goal is to find all functional elements in the human genome, one of the most critical projects by NHGRI after it completed the successful Human Genome Project. All data generated in the course of the project will be released rapidly into public databases.

On 5 September 2012, initial results of the project were released in a coordinated set of 30 papers published in the journals Nature, Science, Genome Biology, and Genome Research.[6][7] These publications combine to show that approximately 20% of noncoding DNA in the human genome is functional while an additional 60% is transcribed with no known function. Much of this functional non-coding DNA is involved in the regulation of the expression of coding genes.[8] Furthermore the expression of each coding gene is controlled by multiple regulatory sites located both near and distant from the gene. These results demonstrate that gene regulation is far more complex than previously believed.[9]

Genome-wide association studies have determined that approximately 90% of single-letter differences in sequences that are associated with various diseases fall outside of protein coding regions. Previously it was not clear how these sequence differences could influence disease however new gene regulatory sites discovered by the ENCODE project in many cases provide an explanation.[2]

Background

The human genome consists of just over 3 billion DNA base pairs. The Human Genome Project, completed in 2003, sequenced the entire genome for one specific person. In the years since then, the genomes of many other individual people have been sequenced, partly under the auspices of the 1000 Genomes Project. Sequencing a genome, however, produces several gigabytes of raw data but does not directly say anything about how it works. The aim of the ENCODE project is to determine which parts of the DNA are biologically active, and make an initial assessment of their functions.

The part of the DNA that has long been best understood is the exome, consisting of around 20,000 protein-coding genes. These genes, however, make up in total only around 1.5% of the DNA, and are separated from each other by long stretches of DNA that does not code for proteins. This remaining DNA includes the so-called regulome, which comprises a variety of DNA elements that in one way or another modulate the expression of protein-coding genes. It has not been clear, though, how much of the total DNA is comprised within the regulome. Until recently, the majority view has been that much of the DNA is "junk"—DNA that is never transcribed and has no biological function. The central goal of the ENCODE project is to map out the regulome, by determining which parts of the DNA belong to it and the mechanisms by which those parts influence gene transcription.

Pilot phase

The project was initiated with a $12 million pilot phase to evaluate a variety of different methods for use in later stages. A number of then-existing techniques were used to analyse a portion of the genome equal to about 1% (30 million base-pairs). The results of these analyses were evaluated based on their ability to identify regions of DNA which were known or suspected to contain functional elements. 50% of the sample area selected for study under this phase was manually selected whilst the other 50% was selected at random.[10] The manually selected regions were selected based on the presence of well studied genes and the availability of comparative data. Methods evaluated included chromatin immunoprecipitation (ChIP) and quantitative PCR.

The ENCODE pilot project rapidly released all of its data into public databases.[11] The pilot phase was successfully finished and the results were published in June 2007 in Nature[4] and in a special issue of Genome Research.[12]

Production phase

Image of ENCODE data in the UCSC Genome Browser. This shows several tracks containing information on gene regulation. The gene on the left (ATP2B4) is transcribed in a wide variety of cells. The gene on the right is only transcribed in a few types of cells, including embryonic stem cells.

In September 2007, NHGRI began funding the production phase of the ENCODE project. In this phase, the goal was to analyze the entire genome and to conduct "additional pilot-scale studies".[13]

As in the pilot project, the production effort is organized as an open consortium. In October 2007, NHGRI awarded grants totaling more than $80 million over four years.[14] The production phase also includes a Data Coordination Center, a Data Analysis Center, and a Technology Development Effort.[15]

By 2010, over 1,000 genome-wide data sets had been produced by the ENCODE project. Taken together, these data sets show which regions are transcribed into RNA, which regions are likely to control the genes that are used in a particular type of cell, and which regions are associated with a wide variety of proteins. The primary assays used in ENCODE are ChIP-seq, DNase I Hypersensitivity, RNA-seq, and assays of DNA methylation.

In September 2012, the project released a much more extensive set of results, in 30 papers published simultaneously in several journals, including six in Nature and a special issue of the journal Genome Research.[16] The most striking finding was that the fraction of human DNA that is biologically active is considerably higher than even the most optimistic previous estimates. In an overview paper, the ENCODE Consortium reported that its members were able to assign biochemical functions to over 80% of the genome.[8] Much of this was found to be involved in controlling the expression levels of coding DNA, which makes up less than 1% of the genome.

The most important new elements of the "encyclopedia" include:

  • A comprehensive map of DNase 1 hypersensitive sites, which are markers for regulatory DNA that is typically located adjacent to genes and allows chemical factors to influence their expression. The map identified nearly 3 million sites of this type, including nearly all that were previously known and many that are novel.[17]
  • A lexicon of short DNA sequences that form recognition motifs for DNA-binding proteins. Approximately 8.4 million such sequences were found, comprising a fraction of the total DNA roughly twice the size of the exome. Thousands of transcription promoters were found to make use of a single stereotyped 50-base-pair footprint.[18]
  • A preliminary sketch of the architecture of the network of human transcription factors, that is, factors that bind to DNA in order to promote or inhibit the expression of genes. The network was found to be quite complex, with factors that operate at different levels as well as numerous feedback loops of various types.[19]
  • A measurement of the fraction of the human genome that is capable of being transcribed into RNA. This fraction was estimated to add up to more than 75% of the total DNA, a much higher value than previous estimates. The project also began to characterize the types of RNA transcripts that are generated at various locations.[20]

modENCODE project

The Model Organism ENCyclopedia Of DNA Elements (modENCODE) project is a continuation of the original ENCODE project targeting the identification of functional elements in selected model organism genomes, specifically, Drosophila melanogaster and Caenorhabditis elegans.[21] The extension to model organisms permits biological validation of the computational and experimental findings of the ENCODE project, something that is difficult or impossible to do in humans.[21]

Funding for the modENCODE project was announced by the National Institutes of Health (NIH) in 2007 and included several different research institutions in the US.[22][23]

In late 2010, the modENCODE consortium unveiled its first set of results with publications on annotation and integrative analysis of the worm and fly genomes in Science.[24][25] Data from these publications is available from the modENCODE web site.[26]

FactorBook

An analysis of transcription factor binding data generated by the ENCODE project is available in the web-accessible repository FactorBook.[27]

See also

References

  1. ^ a b Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS, Kirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D, Haussler D, Kent, WJ (2011). "ENCODE whole-genome data in the UCSC genome browser (2011 update)". Nucleic Acids Res. 39 (Database issue): D871–5. doi:10.1093/nar/gkq1017. PMC 3013645. PMID 21037257. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  2. ^ a b Maher B (2012). "ENCODE: The human encyclopaedia". Nature. 489 (7414): 46–48. doi:10.1038/489046a.
  3. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1371/journal.pbio.1001046, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1371/journal.pbio.1001046 instead.
  4. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 17571346, please use {{cite journal}} with |pmid=17571346 instead.
  5. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 16925836, please use {{cite journal}} with |pmid=16925836 instead.
  6. ^ "ENCODE project at UCSC". ENCODE Consortium. Retrieved 2012-09-05.
  7. ^ Walsh F (2012-09-05). "Detailed map of genome function". BBC News. Archived from the original on 2012-09-06. Retrieved 2012-09-06.
  8. ^ a b Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. doi:10.1038/nature11247. PMID 22955616. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  9. ^ Pennisi E (2012). "Genomics. ENCODE project writes eulogy for junk DNA". Science. 337 (6099): 1159, 1161. doi:10.1126/science.337.6099.1159. PMID 22955811. {{cite journal}}: Unknown parameter |month= ignored (help)
  10. ^ "ENCODE Pilot Project: Target Selection". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
  11. ^ "ENCODE Project at UCSC". University of California at Santa Cruz. Retrieved 2011-08-05.
  12. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1101/gr.6534207, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1101/gr.6534207 instead.
  13. ^ "Genome.gov | ENCODE and modENCODE Projects". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
  14. ^ "National Human Genome Research Institute - Organization". The NIH Almanac. United States National Institutes of Health. Retrieved 2011-08-05.
  15. ^ "Genome.gov | ENCODE Participants and Projects". The ENCODE Project: ENCyclopedia Of DNA Elements. United States National Human Genome Research Institute. 2011-08-01. Retrieved 2011-08-05.
  16. ^ Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E (2012). "Genomics: ENCODE explained". Nature. 489 (7414): 52–5. doi:10.1038/489052a. PMID 22955614. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  17. ^ Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H; et al. (2012). "The accessible chromatin landscape of the human genome". Nature. 489 (7414): 75–82. doi:10.1038/nature11232. PMID 22955617. {{cite journal}}: Explicit use of et al. in: |author= (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  18. ^ Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R; et al. (2012). "An expansive human regulatory lexicon encoded in transcription factor footprints". Nature. 489 (7414): 83–90. doi:10.1038/nature11212. PMID 22955618. {{cite journal}}: Explicit use of et al. in: |author= (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  19. ^ Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J; et al. (2012). "Architecture of the human regulatory network derived from ENCODE data". Nature. 489 (7414): 91–100. doi:10.1038/nature11245. PMID 22955619. {{cite journal}}: Explicit use of et al. in: |author= (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  20. ^ Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W; et al. (2012). "Landscape of transcription in human cells". Nature. 489 (7414): 101–8. doi:10.1038/nature11233. PMID 22955620. {{cite journal}}: Explicit use of et al. in: |author= (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  21. ^ a b "The modENCODE Project: Model Organism ENCyclopedia Of DNA Elements (modENCODE)". NHGRI website. Retrieved 2008-11-13.
  22. ^ "modENCODE Participants and Projects". NHGRI website. Retrieved 2008-11-13.
  23. ^ "Berkeley Lab Life Sciences Awarded NIH Grants for Fruit Fly, Nematode Studies". Lawrence Berkeley National Laboratory website. 2007-05-14. Retrieved 2008-11-13.
  24. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1126/science.1196914, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1126/science.1196914 instead.
  25. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1126/science.1198374, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1126/science.1198374 instead.
  26. ^ "modENCODE". The National Human Genome Research Institute.
  27. ^ FactorBook