European Nucleotide Archive: Difference between revisions
expand |
fixing lead |
||
Line 19: | Line 19: | ||
The '''European Nucleotide Archive''' (ENA) is an open access, annotated collection of all publicly available [[nucleic acid sequence|nucleotide sequence]]s.<ref name="pmid20972220">{{cite journal |author=Leinonen R, Akhtar R, Birney E, ''et al.'' |title=The European Nucleotide Archive |journal=Nucleic Acids Res. |volume=39 |issue=Database issue |pages=D28–31 |year=2011 |month=January |pmid=20972220 |pmc=3013801 |doi=10.1093/nar/gkq967 |url=}}</ref> The collection is composed of three main databases: the [[Sequence Read Archive]] (SRA), the Trace Archive and EMBL-bank. The ENA is produced and maintained by the [[European Bioinformatics Institute]] and is a member of the [[International Nucleotide Sequence Database Collaboration]] (INSDC) along with the [[DNA Data Bank of Japan]] and [[GenBank]]. |
The '''European Nucleotide Archive''' (ENA) is an open access, annotated collection of all publicly available [[nucleic acid sequence|nucleotide sequence]]s.<ref name="pmid20972220">{{cite journal |author=Leinonen R, Akhtar R, Birney E, ''et al.'' |title=The European Nucleotide Archive |journal=Nucleic Acids Res. |volume=39 |issue=Database issue |pages=D28–31 |year=2011 |month=January |pmid=20972220 |pmc=3013801 |doi=10.1093/nar/gkq967 |url=}}</ref> The collection is composed of three main databases: the [[Sequence Read Archive]] (SRA), the Trace Archive and EMBL-bank. The ENA is produced and maintained by the [[European Bioinformatics Institute]] and is a member of the [[International Nucleotide Sequence Database Collaboration]] (INSDC) along with the [[DNA Data Bank of Japan]] and [[GenBank]]. |
||
Sequence data from the ENA and its INSDC partners are used in biological and medical research around the world, and the data are accessed millions of times every month. As of |
Sequence data from the ENA and its INSDC partners are used in biological and medical research around the world, and the data are accessed millions of times every month. As of early 2012, the ENA contains complete [[genome]]s of 5,682 organisms and sequence data for almost 700,000.<ref name="CochraneCook2012">{{cite journal|last1=Cochrane|first1=Guy|last2=Cook|first2=Charles E|last3=Birney|first3=Ewan|title=The future of DNA sequence archiving|journal=GigaScience|volume=1|issue=1|year=2012|pages=2|issn=2047-217X|doi=10.1186/2047-217X-1-2}}</ref> |
||
Further, the data are [[Exponential growth|increasing exponentially]] with a doubling time of approximately 10 months.<ref name="CochraneAlako2012">{{cite journal|last1=Cochrane|first1=G.|last2=Alako|first2=B.|last3=Amid|first3=C.|last4=Bower|first4=L.|last5=Cerdeno-Tarraga|first5=A.|last6=Cleland|first6=I.|last7=Gibson|first7=R.|last8=Goodgame|first8=N.|last9=Jang|first9=M.|last10=Kay|first10=S.|last11=Leinonen|first11=R.|last12=Lin|first12=X.|last13=Lopez|first13=R.|last14=McWilliam|first14=H.|last15=Oisel|first15=A.|last16=Pakseresht|first16=N.|last17=Pallreddy|first17=S.|last18=Park|first18=Y.|last19=Plaister|first19=S.|last20=Radhakrishnan|first20=R.|last21=Riviere|first21=S.|last22=Rossello|first22=M.|last23=Senf|first23=A.|last24=Silvester|first24=N.|last25=Smirnov|first25=D.|last26=ten Hoopen|first26=P.|last27=Toribio|first27=A.|last28=Vaughan|first28=D.|last29=Zalunin|first29=V.|title=Facing growth in the European Nucleotide Archive|journal=Nucleic Acids Research|volume=41|issue=D1|year=2012|pages=D30–D35|issn=0305-1048|doi=10.1093/nar/gks1175}}</ref> |
|||
==History== |
==History== |
Revision as of 19:32, 6 January 2013
![]() | |
Content | |
---|---|
Description | Nucleotide sequences from all publicly available sources with supporting bibliographic and biological annotation. |
Data types captured | Nucleotide Sequence, |
Organisms | all |
Contact | |
Research center | European Bioinformatics Institute |
Primary citation | PMID 20972220 |
Release date | 1982 |
Access | |
Data format | XML ASN.1 EMBL-Bank format |
Website | [1] |
Download URL | [2] |
Tools | |
Web | BLAST |
Standalone | BLAST |
Miscellaneous | |
License | Public domain |
The European Nucleotide Archive (ENA) is an open access, annotated collection of all publicly available nucleotide sequences.[1] The collection is composed of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-bank. The ENA is produced and maintained by the European Bioinformatics Institute and is a member of the International Nucleotide Sequence Database Collaboration (INSDC) along with the DNA Data Bank of Japan and GenBank.
Sequence data from the ENA and its INSDC partners are used in biological and medical research around the world, and the data are accessed millions of times every month. As of early 2012, the ENA contains complete genomes of 5,682 organisms and sequence data for almost 700,000.[2] Further, the data are increasing exponentially with a doubling time of approximately 10 months.[3]
History
The EMBL Data Library was established in 1982 at the European Molecular Biology Laboratory (EMBL) Heidelberg and was later renamed the EMBL Nucleotide Sequence Database.
With the advancement of Sanger sequencing, the Wellcome Trust Sanger Institute (then known as The Sanger Centre) began cataloguing sequence reads along with quality information in a database called The Trace Archive.[1] In 2008, the European Bioinformatics Institute (EBI) combined the Trace Archive, EMBL Nucleotide Sequence Database and the new Short Read Archive (SRA) to make up the ENA, aimed at providing a comprehensive nucleotide sequence archive.[1]
![](http://upload.wikimedia.org/wikipedia/commons/thumb/d/d1/European_Bioinformatics_Institute%2C_Hinxton_2.jpg/300px-European_Bioinformatics_Institute%2C_Hinxton_2.jpg)
EMBL-Bank
EMBL-Bank is that part of the ENA database dedicated to annotated and assembled nucleotide sequence entries.
Release 112 on 31 May 2012 contained 247,335,689 sequence entries comprising 429,512,389,024 nucleotides and this is increasing rapidly.
Sequence Read Archive
The ENA operates an instance of the Sequence (or Short) Read Archive (SRA). The SRA is an archival repository of sequence reads and analyses which are intended for public release.[4] Currently, the archive accepts both sequence read and analysis (e.g. BAM alignment and VCF variation) data generated by next-generation sequencing methodologies such as 454, Illumina Genome Analyzer and ABI SOLiD data. The SRA operates under the guidance of the International Nucleotide Sequence Database Collaboration.[4]
In 2010 the SRA made up approximately 95% of the base pair data available through the ENA,[1] encompassing over 500 billion sequence reads made up of over 60 trillion base pairs.[4] Almost half of this data was deposited in relation to the 1000 Genomes Project.[4]
Storage
The ENA handles large volumes of data which pose a significant storage challenge.[3][5] To cope with these storage requirements, the ENA discards less-valuable sequencing platform data and implements advanced compression strategies.[4] The CRAM reference-based compression toolkit was developed to help reduce ENA storage requirements.[3][6]
References
- ^ a b c d Leinonen R, Akhtar R, Birney E; et al. (2011). "The European Nucleotide Archive". Nucleic Acids Res. 39 (Database issue): D28–31. doi:10.1093/nar/gkq967. PMC 3013801. PMID 20972220.
{{cite journal}}
: Explicit use of et al. in:|author=
(help); Unknown parameter|month=
ignored (help)CS1 maint: multiple names: authors list (link) - ^ Cochrane, Guy; Cook, Charles E; Birney, Ewan (2012). "The future of DNA sequence archiving". GigaScience. 1 (1): 2. doi:10.1186/2047-217X-1-2. ISSN 2047-217X.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) - ^ a b c Cochrane, G.; Alako, B.; Amid, C.; Bower, L.; Cerdeno-Tarraga, A.; Cleland, I.; Gibson, R.; Goodgame, N.; Jang, M.; Kay, S.; Leinonen, R.; Lin, X.; Lopez, R.; McWilliam, H.; Oisel, A.; Pakseresht, N.; Pallreddy, S.; Park, Y.; Plaister, S.; Radhakrishnan, R.; Riviere, S.; Rossello, M.; Senf, A.; Silvester, N.; Smirnov, D.; ten Hoopen, P.; Toribio, A.; Vaughan, D.; Zalunin, V. (2012). "Facing growth in the European Nucleotide Archive". Nucleic Acids Research. 41 (D1): D30–D35. doi:10.1093/nar/gks1175. ISSN 0305-1048.
- ^ a b c d e Leinonen R, Sugawara H, Shumway M (2011). "The sequence read archive". Nucleic Acids Res. 39 (Database issue): D19–21. doi:10.1093/nar/gkq1019. PMC 3013647. PMID 21062823.
{{cite journal}}
: Unknown parameter|month=
ignored (help)CS1 maint: multiple names: authors list (link) - ^ Cochrane, G.; Akhtar, R.; Bonfield, J.; Bower, L.; Demiralp, F.; Faruque, N.; Gibson, R.; Hoad, G.; Hubbard, T.; Hunter, C.; Jang, M.; Juhos, S.; Leinonen, R.; Leonard, S.; Lin, Q.; Lopez, R.; Lorenc, D.; McWilliam, H.; Mukherjee, G.; Plaister, S.; Radhakrishnan, R.; Robinson, S.; Sobhany, S.; Hoopen, P. T.; Vaughan, R.; Zalunin, V.; Birney, E. (2009). "Petabyte-scale innovations at the European Nucleotide Archive". Nucleic Acids Research. 37 (Database): D19–D25. doi:10.1093/nar/gkn765. ISSN 0305-1048.
- ^ Hsi-Yang Fritz, M.; Leinonen, R.; Cochrane, G.; Birney, E. (2011). "Efficient storage of high throughput DNA sequencing data using reference-based compression". Genome Research. 21 (5): 734–740. doi:10.1101/gr.114819.110. ISSN 1088-9051.