Jump to content

Earth Microbiome Project: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m removed copyedit tag. Other tags (more footnotes, wikify) are still relevant.
Substituted more recent reference
Line 3: Line 3:
The '''Earth Microbiome Project''' (EMP) is an initiative to collect natural samples and to analyze the microbial community around the globe.
The '''Earth Microbiome Project''' (EMP) is an initiative to collect natural samples and to analyze the microbial community around the globe.


[[Microbes]] are highly abundant, diverse, and have an important role in the ecological system. There are an estimated 1.3&nbsp;x&nbsp;10<sup>28</sup> [[Archaea|archaeal]] cells, 3.1&nbsp;x&nbsp;10<sup>28</sup> [[bacterial]] cells and 1&nbsp;x&nbsp;10<sup>30</sup> [[virus]] particles in the ocean.<ref name="Suttle2007">{{Cite pmid|17853907|noedit}}</ref><ref name="CSS2002">{{Cite pmid|12097644|noedit}}</ref> Bacteria are estimated to number about 160 per ml in ocean water, 6,400–38,000 per g in soil, and 70 per ml in sewage works.<ref name=CSS2002/> Yet only 1% of the genetic variation has been characterized,<ref name="LiuMarsh1977">{{Cite pmid|9361437|noedit}}</ref> and the specific interactions between microbes are largely unknown.
[[Microbes]] are highly abundant, diverse, and have an important role in the ecological system. There are an estimated 1.3&nbsp;x&nbsp;10<sup>28</sup> [[Archaea|archaeal]] cells, 3.1&nbsp;x&nbsp;10<sup>28</sup> [[bacterial]] cells and 1&nbsp;x&nbsp;10<sup>30</sup> [[virus]] particles in the ocean.<ref name="Suttle2007">{{Cite pmid|17853907|noedit}}</ref><ref name="CSS2002">{{Cite pmid|12097644|noedit}}</ref> Bacteria are estimated to number about 160 per ml in ocean water, 6,400–38,000 per g in soil, and 70 per ml in sewage works.<ref name=CSS2002/> Yet as of 2010, it was estimated that the total global environmental DNA sequencing effort had produced less than 1 percent of the total DNA found in a liter of seawater or a gram of soil,<ref>{{cite doi|10.4056/sigs.1433550|noedit}}</ref> and the specific interactions between microbes are largely unknown.


The EMP aims to process as many as 200,000 samples in different [[biomes]], generating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction. Using these data, new ecological and evolutionary theories can be proposed and tested.<ref name=GilbertDor2011>{{cite doi|10.1186/2042-5783-1-5|noedit}}</ref>
The EMP aims to process as many as 200,000 samples in different [[biomes]], generating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction. Using these data, new ecological and evolutionary theories can be proposed and tested.<ref name=GilbertDor2011>{{cite doi|10.1186/2042-5783-1-5|noedit}}</ref>

Revision as of 05:15, 9 March 2012

The Earth Microbiome Project (EMP) is an initiative to collect natural samples and to analyze the microbial community around the globe.

Microbes are highly abundant, diverse, and have an important role in the ecological system. There are an estimated 1.3 x 1028 archaeal cells, 3.1 x 1028 bacterial cells and 1 x 1030 virus particles in the ocean.[1][2] Bacteria are estimated to number about 160 per ml in ocean water, 6,400–38,000 per g in soil, and 70 per ml in sewage works.[2] Yet as of 2010, it was estimated that the total global environmental DNA sequencing effort had produced less than 1 percent of the total DNA found in a liter of seawater or a gram of soil,[3] and the specific interactions between microbes are largely unknown.

The EMP aims to process as many as 200,000 samples in different biomes, generating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction. Using these data, new ecological and evolutionary theories can be proposed and tested.[4]

Goals

The primary goal of EMP is to survey microbial composition in many environments across the planet, in time as well as space, using a standard set of protocols. The development of standardized protocols is vital, because variations in sample extraction, amplification, sequencing and analysis would introduce biases that invalidate comparisons of microbial community structure.[5]

Another important goal is to determine how reconstruction of microbial communities is affected by analytic biases. The rate of technological advance is rapid, and it is necessary to understand how data using updated protocols will compare with data collected using earlier techniques. Data from this project will be archived in a database to facilitate analysis. Other outputs will include a global atlas of protein function and a catalog of reassembled genomes classified by their taxonomic distributions.[5]

Challenges

Large amounts of sequence data generated from analyzing diverse microbial communities are a challenge to store, organize and analyse. The problem is exacerbated by the short reads provided by the high-throughput sequencing platform that will be the standard instrument used in the EMP project. Improved algorithms, improved analysis tools, huge amounts of computer storage, and access to many thousands of hours of supercomputer time will be necessary.[6]

Another challenge will be the large number of sequencing errors that are expected. Next-generation sequencing technologies provide enormous throughput but lower accuracies than older sequencing methods. When sequencing a single genome, the intrinsic lower accuracy of these methods is far more than compensated for by the ability to cover the entire genome multiple times in opposite directions from multiple start points, but this capability provides no improvement in accuracy when sequencing a diverse mixture of genomes. The question will be, how can sequencing errors be distinguished from actual diversity in the collected microbial samples?[6]

Despite the issuance of standard protocols, systematic biases from lab to lab are expected. The need to amplify DNA from samples with low biomass will introduce additional distortions of the data. Assembly of genomes of even the dominant organisms in a diverse sample of organisms requires gigabytes of sequence data.[6]

The EMP must avoid a problem that has become prevalent in the public sequence databases. With the advancement in high-throughput sequencing technologies, many sequences are entering public databases with no experimentally determined function, but which have been annotated on the basis of observed homologies with a known sequence. The first known sequence is used to annotate the first unknown sequence, but what is happening is that the first unknown sequence is being used to annotate the second unknown sequence and so on. Homology is only a modestly reliable predictor of function.[7]

Methods

Standard protocols for sampling, DNA extraction, 16S rRNA amplification, 18S rRNA amplification, and "shotgun" metagenomics have been developed or are under development.[8]

Sample collection

Samples will be collected using appropriate methods from various environments including deep ocean, fresh water lakes, desert sand, and soil. Standardized collection protocols will be used when possible, so that the results are comparable. Microbes from natural samples cannot always be cultured. Because of this, metagenomic methods will be employed to sequence all the DNA or RNA in a sample in a culture-independent fashion.

Wet lab

File:Water filtering.png
Common protocol used to filter for protist, bacteria and viruses from water samples

The wet lab usually needs to perform a series of procedures to select and purify the microbial portion of the samples. The purification process may be very different according to the type of sample. DNA will be extracted from soil particles, or microbes will be concentrated using a series of filtration techniques. In addition, various amplification techniques may be used to increase DNA yield. For example, non-PCR based Multiple displacement amplification is preferred by some researchers. DNA extraction, the use of primers, and PCR protocols are all areas that, in order to avoid bias, need to be performed following carefully standardized protocols.

Sequencing

Depending on the biological question, researchers can choose to sequence a metagenomic sample using two main approaches. If the biological question to be resolved is, what types of organisms are present and in what abundance, the preferred approach would be to target and amplify a specific gene that is highly conserved among the species of interest. The 16S ribosomal RNA gene for bacteria and the 18S ribosomal RNA gene for protists are often used as target genes for this purpose. The advantage of targeting a specific gene is that the gene can be amplified and sequenced at a very high coverage. This approach is called "deep sequencing", which allows rare species to be identified in a sample. However, this approach will not enable assembly of any whole genomes, nor will it provide information on how organisms may interact with each other. The second approach is called shotgun metagenomics, in which all the DNA in the sample is sheared and the random fragments sequenced. In principle, this approach allows for the assembly of whole microbial genomes, and it allows inference of metabolic relationships. However, if most of microbes are uncharacterised in a given environment, de novo assembly will be computationally expensive.

Data analysis

Similar to the manner in which wet lab procedures must be standardized, EMP proposes to standardize the bioinformatics aspects of sample processing.[5]

Data analysis usually includes the following steps: 1) Data clean up. A pre-procedure to clean up any reads with low quality scores; any sequences containing "N" or ambiguous nucleotides are removed. 2) Assignment of taxonomy to the sequences. This method is usually done using tools such as BLAST[9] or RDP.[10] Very often, novel sequences are discovered which cannot be mapped to existing taxonomy. In this case, a phylogenetic tree is created with the novel sequences and a pool of closely related known sequences. One can then derive the taxonomy of the novel sequences based on the phylogenetic tree.

Depending on the sequencing technology and the underlying biological question, additional methods may be employed. For example, if the sequenced reads are too short to infer any useful information, an assembly will be required. An assembly can also be used to construct whole genomes, which will provide useful information on the species. Furthermore, if the metabolic relationships within a microbial metagenome are to be understood, the DNA sequences need to be translated into amino acid sequences using gene prediction tools such as GeneMark[11] or FragGeneScan.[12]

Project output

Four key outputs from the EMP have been defined:[13]

  • Ultimately, all primary data generated from the Earth Microbiome Project, regardless of their degree of conclusiveness, will be stored in a centralized database called the "Gene Atlas" (GA). The GA will have sequence data, annotations and environmental metadata. Known as well as unknown sequences, i.e. "Dark Matter", will be included hoping that, given the time, the unknown sequences may eventually be characterized.
  • Assembled genomes, annotated using an automated pipeline, will be stored in "Earth Microbiome Assembled Genomes" (EM-AG) in public repositories. These will enable comparative genomic analysis.
  • Interactive visualizations of the data will be provided through the "Earth Microbiome Visualization Portal" (EM-VIP), which will allow the relationship between microbial makeup, environmental parameters, and genomic function to be viewed.
  • Reconstructed metabolic profiles will be offered through "Earth Microbiome Metabolic Reconstruction" (EMMR).

Notes

  1. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 17853907, please use {{cite journal}} with |pmid=17853907 instead.
  2. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 12097644, please use {{cite journal}} with |pmid=12097644 instead.
  3. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.4056/sigs.1433550, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.4056/sigs.1433550 instead.
  4. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1186/2042-5783-1-5, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1186/2042-5783-1-5 instead.
  5. ^ a b c "Modeling the Earth Microbiome". Microbe Magazine. 7 (2): 64–69. 2012. Retrieved 2012-03-06. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  6. ^ a b c Jansson, Janet (2011). "Towards "Tera-Terra": Terabase Sequencing of Terrestrial Metagenomes". Microbe Magazine. 6 (7): 309–15. Retrieved 2012-03-07.
  7. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1146/annurev-marine-120709-142811, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1146/annurev-marine-120709-142811 instead.
  8. ^ "Earth Microbiome Project / Standard Protocols". Retrieved 2012-03-07.
  9. ^ "BLAST: Basic Local Alignment Search Tool".
  10. ^ "Ribosomal Database Project". Retrieved 2012-03-06.
  11. ^ "GeneMark - Free gene prediction software". Retrieved 2012-03-06.
  12. ^ "FragGeneScan". Retrieved 2012-03-06.
  13. ^ "Earth Microbiome Project / Defining the Tasks". Retrieved 2012-03-07.

See also

External links