DNA annotation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
APashkov (talk | contribs)
Add, modify and update several inner and outer links. Minor grammar fixes and restructuring, particularly in annotation steps and types.
SoffGonza (talk | contribs)
No edit summary
Line 94: Line 94:
| pmc = 2443188
| pmc = 2443188
}}</ref>
}}</ref>
==Applications==
===Disease diagnosis===
Gene Ontology is being used by researchers to establish a disease-gene relationship, as GO helps in the identification of novel genes, the alterations in their expression, distribution and function under a different set of conditions, such as diseased versus healthy<ref>{{cite book |last1=Saxena |first1=R. |last2=Bishnoi |first2=R. |last3=Singla |first3=D. |title=Bioinformatics : methods and applications |date=2021 |publisher=Academic Press |location=London |isbn=978-0-323-89775-4 |pages=145-157 |url=https://www.sciencedirect.com/science/article/pii/B9780323897754000158?via%3Dihub |access-date=13 April 2023}}</ref>.
Databases of this disease-gene relationships of different organisms have been created, such as Plant-Pathogen Ontology<ref>{{cite book |last1=Cooper |first1=L. |last2=Jaiswal |first2=P. |title=Plant bioinformatics : methods and protocols |date=2016 |publisher=Humana Press |location=Totowa, N.J. |isbn=978-1-4939-3167-5 |pages=89-114 |edition=2nd |url=https://link.springer.com/protocol/10.1007/978-1-4939-3167-5_5}}</ref>, Plant-Associated Microbe Gene Ontology<ref>{{cite journal |last1=Torto-Alalibo |first1=Trudy |last2=Collmer |first2=Candace W |last3=Gwinn-Giglio |first3=Michelle |title=The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium: community development of new Gene Ontology terms describing biological processes involved in microbe-host interactions |journal=BMC Microbiology |date=2009 |volume=9 |issue=Suppl 1 |pages=S1 |doi=10.1186/1471-2180-9-S1-S1}}</ref> or DisGeNET<ref>{{cite journal |last1=Piñero |first1=J. |last2=Ramírez-Anguita |first2=J.M. |last3=Saüch-Pitarch |first3=J. |last4=Ronzano |first4=F. |last5=Centeno |first5=E. |last6=Sanz |first6=F. |last7=Furlong |first7=L.I. |title=The DisGeNET knowledge platform for disease genomics: 2019 update |journal=Nucleic Acids Research |date=2020 |volume=48 |issue=D1 |pages=D845–D855 |doi=10.1093/nar/gkz1021 |url=https://academic.oup.com/nar/article/48/D1/D845/5611674}}</ref>.


==References==
==References==

Revision as of 00:18, 14 April 2023

DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanation or commentary. Once a genome is sequenced, it needs to be annotated to make sense of it.[1] Genes in a eukaryotic genome can be annotated using various annotation tools[2] such as FINDER.[3] A modern annotation pipeline can support a user-friendly web interface and software containerization such as MOSGA.[4][5] Modern annotation pipelines for prokaryotic genomes are Bakta,[6] Prokka[7] and PGAP.[8]

For DNA annotation, a previously unknown sequence representation of genetic material is enriched with information relating genomic position to intron-exon boundaries, regulatory sequences, repeats, gene names and protein products. This annotation is stored in genomic databases such as Mouse Genome Informatics, FlyBase, and WormBase. Educational materials on some aspects of biological annotation from the 2006 Gene Ontology annotation camp and similar events are available at the Gene Ontology website.[9]

The National Center for Biomedical Ontology develops tools for automated annotation[10] of database records based on the textual descriptions of those records.

As a general method, dcGO[11] has an automated procedure for statistically inferring associations between ontology terms and protein domains or combinations of domains from the existing gene/protein-level annotations.

Process

Genome annotation consists of three main steps:[12]

  1. Identify portions of the genome that do not code for proteins.
  2. Identify elements on the genome, a process called gene prediction.
  3. Attach biological information to these elements.

Automatic annotation tools attempt to perform these steps via computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

A simple method of gene annotation relies on homology based search tools, like BLAST, to search for homologous genes in specific databases; the resulting information is then used to annotate genes and genomes.[13] However, as information is added to the annotation platform, manual annotators become capable of deconvoluting discrepancies between genes that are given the same annotation. Some databases use genome context information, similarity scores, experimental data, and integrations of other resources to provide genome annotations through their subsystems approach. Other databases (e.g. Ensembl) rely on curated data sources as well as a range of different software tools in their automated genome annotation pipeline.[14]

There are two types of DNA annotation:

  • Structural annotation consists of the identification of genomic elements. Finding the locations of ORFs, coding regions and regulatory motifs, as well as determining the gene structure, are examples of structural annotation.
  • Functional annotation involves attaching biological information to genomic elements, by determining which biochemical and biological functions they have, the regulatory and interaction networks they participate in, and their expression.

These steps may involve both biological experiments and in silico analysis. Proteogenomics based approaches utilize information from expressed proteins, often derived from mass spectrometry, to improve genomic annotations.[15]

A variety of software tools have been developed that allow scientists to view and share genome annotations, such as MAKER.

Genome annotation remains a major challenge for scientists investigating the human genome, now that the genome sequences of more than a thousand human individuals (the 100,000 Genomes Project) and several model organisms are largely complete.[16][17] Identifying the locations of genes and other genetic control elements is often described as defining the biological "parts list" for the assembly and normal operation of an organism.[13] Scientists are still at an early stage in the process of delineating this parts list and in understanding how all the parts "fit together".[18]

Genome annotation is an active area of investigation and involves a number of different organizations in the life science community which publish the results of their efforts in publicly available biological databases accessible via the web and other electronic means. Here is an alphabetical listing of on-going projects relevant to genome annotation:

At Wikipedia, genome annotation has started to become automated under the auspices of the Gene Wiki portal which operates a bot that harvests gene data from research databases and creates gene stubs on that basis.[19]

Applications

Disease diagnosis

Gene Ontology is being used by researchers to establish a disease-gene relationship, as GO helps in the identification of novel genes, the alterations in their expression, distribution and function under a different set of conditions, such as diseased versus healthy[20]. Databases of this disease-gene relationships of different organisms have been created, such as Plant-Pathogen Ontology[21], Plant-Associated Microbe Gene Ontology[22] or DisGeNET[23].

References

  1. ^ "Definition of genome annotation".
  2. ^ GAAS, NBIS -- National Bioinformatics Infrastructure Sweden, 13 April 2022, retrieved 25 April 2022
  3. ^ Banerjee S, Bhandary P, Woodhouse M, Sen TZ, Wise RP, Andorf CM (April 2021). "FINDER: an automated software package to annotate eukaryotic genes from RNA-Seq data and associated protein sequences". BMC Bioinformatics. 44 (9): e89. doi:10.1186/s12859-021-04120-9. PMC 8056616. PMID 33879057.
  4. ^ Martin, Roman; Hackl, Thomas; Hattab, Georges; Fischer, Matthias G; Heider, Dominik (1 April 2021). Birol, Inanc (ed.). "MOSGA: Modular Open-Source Genome Annotator". Bioinformatics. 36 (22–23): 5514–5515. doi:10.1093/bioinformatics/btaa1003. hdl:21.11116/0000-0006-FED4-D. ISSN 1367-4803. PMID 33258916.
  5. ^ Martin, Copyright (C) 2021 Roman Martin. Designed and developed by Roman. "MOSGA". mosga.mathematik.uni-marburg.de. Retrieved 25 April 2022.{{cite web}}: CS1 maint: numeric names: authors list (link)
  6. ^ Schwengers, Oliver; Jelonek, Lukas; Dieckmann, Marius Alfred; Beyvers, Sebastian; Blom, Jochen; Goesmann, Alexander (5 November 2021). "Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification". Microbial Genomics. 7 (11). doi:10.1099/mgen.0.000685. PMC 8743544. PMID 34739369.
  7. ^ Seemann, Torsten (15 July 2014). "Prokka: rapid prokaryotic genome annotation". Bioinformatics. 30 (14): 2068–2069. doi:10.1093/bioinformatics/btu153. PMID 24642063.
  8. ^ Li, Wenjun; O’Neill, Kathleen R; Haft, Daniel H; DiCuccio, Michael; Chetvernin, Vyacheslav; Badretdin, Azat; Coulouris, George; Chitsaz, Farideh; Derbyshire, Myra K.; Durkin, A Scott; Gonzales, Noreen R; Gwadz, Marc; Lanczycki, Christopher J.; Song, James S; Thanki, Narmada; Wang, Jiyao; Yamashita, Roxanne A.; Yang, Mingzhang; Zheng, Chanjuan; Marchler-Bauer, Aron; Thibaud-Nissen, Françoise (8 January 2021). "RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation". Nucleic Acids Research. 49 (D1): D1020–D1028. doi:10.1093/nar/gkaa1105. PMC 7779008. PMID 33270901.
  9. ^ "GO Teaching Resources". Archived from the original on 10 October 2006. Retrieved 21 September 2006.
  10. ^ "NCBO Annotator | bioontology.org". ncbo.bioontology.org. Retrieved 8 February 2023.
  11. ^ Fang, H; Gough, J (2013). "DcGO: Database of domain-centric ontologies on functions, phenotypes, diseases and more". Nucleic Acids Research. 41 (Database issue): D536–44. doi:10.1093/nar/gks1080. PMC 3531119. PMID 23161684.
  12. ^ Stein, L. (2001). "Genome annotation: from sequence to biology". Nature Reviews Genetics. 2 (7): 493–503. doi:10.1038/35080529. PMID 11433356. S2CID 12044602.
  13. ^ a b Pevsner, Jonathan (2009). Bioinformatics and functional genomics (2nd ed.). Hoboken, N.J: Wiley-Blackwell. ISBN 9780470085851.
  14. ^ "Ensembl's genome annotation pipeline online documentation". Archived from the original on 5 March 2016.
  15. ^ Gupta, Nitin; Stephen Tanner; Navdeep Jaitly; Joshua N Adkins; Mary Lipton; Robert Edwards; Margaret Romine; Andrei Osterman; Vineet Bafna; Richard D Smith; Pavel A Pevzner (September 2007). "Whole proteome analysis of post-translational modifications: applications of mass-spectrometry for proteogenomic annotation". Genome Research. 17 (9): 1362–1377. doi:10.1101/gr.6427907. ISSN 1088-9051. PMC 1950905. PMID 17690205.
  16. ^ ENCODE Project Consortium (2011). Becker PB (ed.). "A User's Guide to the Encyclopedia of DNA Elements (ENCODE)". PLOS Biology. 9 (4): e1001046. doi:10.1371/journal.pbio.1001046. PMC 3079585. PMID 21526222.{{cite journal}}: CS1 maint: unflagged free DOI (link) Open access icon
  17. ^ McVean, G. A.; Abecasis, D. M.; Auton, R. M.; Brooks, G. A. R.; Depristo, D. R.; Durbin, A.; Handsaker, A. G.; Kang, P.; Marth, E. E.; McVean, P.; Gabriel, S. B.; Gibbs, R. A.; Green, E. D.; Hurles, M. E.; Knoppers, B. M.; Korbel, J. O.; Lander, E. S.; Lee, C.; Lehrach, H.; Mardis, E. R.; Marth, G. T.; McVean, G. A.; Nickerson, D. A.; Schmidt, J. P.; Sherry, S. T.; Wang, J.; Wilson, R. K.; Gibbs (Principal Investigator), R. A.; Dinh, H.; et al. (2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. Bibcode:2012Natur.491...56T. doi:10.1038/nature11632. PMC 3498066. PMID 23128226.
  18. ^ Dunham, I.; Bernstein, A.; Birney, S. F.; Dunham, P. J.; Green, C. A.; Gunter, F.; Snyder, C. B.; Frietze, S.; Harrow, J.; Kaul, R.; Khatun, J.; Lajoie, B. R.; Landt, S. G.; Lee, B. K.; Pauli, F.; Rosenbloom, K. R.; Sabo, P.; Safi, A.; Sanyal, A.; Shoresh, N.; Simon, J. M.; Song, L.; Trinklein, N. D.; Altshuler, R. C.; Birney, E.; Brown, J. B.; Cheng, C.; Djebali, S.; Dong, X.; et al. (2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489...57T. doi:10.1038/nature11247. PMC 3439153. PMID 22955616.
  19. ^ Huss, Jon W.; Orozco, C; Goodale, J; Wu, C; Batalov, S; Vickers, TJ; Valafar, F; Su, AI (2008). "A Gene Wiki for Community Annotation of Gene Function". PLOS Biology. 6 (7): e175. doi:10.1371/journal.pbio.0060175. PMC 2443188. PMID 18613750.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  20. ^ Saxena, R.; Bishnoi, R.; Singla, D. (2021). Bioinformatics : methods and applications. London: Academic Press. pp. 145–157. ISBN 978-0-323-89775-4. Retrieved 13 April 2023.
  21. ^ Cooper, L.; Jaiswal, P. (2016). Plant bioinformatics : methods and protocols (2nd ed.). Totowa, N.J.: Humana Press. pp. 89–114. ISBN 978-1-4939-3167-5.
  22. ^ Torto-Alalibo, Trudy; Collmer, Candace W; Gwinn-Giglio, Michelle (2009). "The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium: community development of new Gene Ontology terms describing biological processes involved in microbe-host interactions". BMC Microbiology. 9 (Suppl 1): S1. doi:10.1186/1471-2180-9-S1-S1.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  23. ^ Piñero, J.; Ramírez-Anguita, J.M.; Saüch-Pitarch, J.; Ronzano, F.; Centeno, E.; Sanz, F.; Furlong, L.I. (2020). "The DisGeNET knowledge platform for disease genomics: 2019 update". Nucleic Acids Research. 48 (D1): D845–D855. doi:10.1093/nar/gkz1021.