BioCyc database collection: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Laurakreed (talk | contribs)
mNo edit summary
give external links to all refs-bot next
Line 1: Line 1:
The [http://biocyc.org/ BioCyc] [[database]] collection is an assortment of organism specific Pathway/ [[Genome]] Databases (PGDBs). They provide reference to genome and metabolic pathways of few thousand organisms.<ref name = "BioCyc1">Caspi R, Altman T, Dreher K, Fulcher CA, Subhraveti P, Keseler IM, Kothari A, Krummenacker M, Latendresse M, Mueller LA,Ong Q, Paley S, Pujar A, Shearer AG, Travers M, Weerasinghe D, Zhang P,Karp PD.
The [http://biocyc.org/ BioCyc] [[database]] collection is an assortment of organism specific Pathway/ [[Genome]] Databases (PGDBs). They provide reference to genome and metabolic pathways of few thousand organisms.<ref name = "BioCyc1">{{Cite journal|doi=10.1093/nar/gkr1014}}</ref> As of June 23, 2014, there are 3563 databases within BioCyc. The list of databases can be found [http://biocyc.org/biocyc-pgdb-list.shtml here]. [http://www.sri.com/ SRI International], based in Menlo Park, California, maintains the BioCyc database family.
[http://nar.oxfordjournals.org/content/40/D1/D742.long "The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases,"]
Nucleic Acids Research 40:D742-53 2011</ref> As of June 23, 2014, there are 3563 databases within BioCyc. The list of databases can be found [http://biocyc.org/biocyc-pgdb-list.shtml here]. [http://www.sri.com/ SRI International], based in Menlo Park, California, maintains the BioCyc database family.


<big>'''Categories of Databases within BioCyc:'''</big>
<big>'''Categories of Databases within BioCyc:'''</big>
Line 7: Line 5:
Based on the manual curation done, BioCyc database family is divided into 3 tiers:
Based on the manual curation done, BioCyc database family is divided into 3 tiers:


'''Tier 1:''' Databases which have received at least one year literature based manual curation. Currently there are seven databases in Tier 1. Out of the seven, [[MetaCyc]] is a major database that contains metabolic pathways for 2063 organisms.<ref name="BioCyc1"/><ref name="BioCyc2">Karp, PD, and Caspi, R. [http://link.springer.com/article/10.1007%2Fs00204-011-0705-2 "A survey of metabolic databases emphasizing the MetaCyc family"] Archives of Toxicology 85:1015-33 2011</ref> The other important Tier 1 database is [http://humancyc.org/ HumanCyc] which contains around 250 metabolic pathways found in humans.<ref name="BioCyc3">P. Romero, J. Wagg, M.L. Green, D. Kaiser, M. Krummenacker, and P.D. Karp
'''Tier 1:''' Databases which have received at least one year literature based manual curation. Currently there are seven databases in Tier 1. Out of the seven, [[MetaCyc]] is a major database that contains metabolic pathways for 2063 organisms.<ref name="BioCyc1"/><ref name="BioCyc2">{{Cite journal|doi=10.1007/s00204-011-0705-2}}</ref> The other important Tier 1 database is [http://humancyc.org/ HumanCyc] which contains around 250 metabolic pathways found in humans.<ref name="BioCyc3">{{Cite journal|doi=10.1186/gb-2004-6-1-r2}}</ref> The remaining five databases include, EcoCyc (''E. coli''),<ref name="BioCyc4">{{Cite journal|doi=10.1093/nar/gkq1143}}</ref> AraCyc (''Arabidopsis thaliana''), YeastCyc (''Saccharomyces cerevisiae''), LeishCyc (''Leishmania major Friedlin'') and TrypanoCyc (''Trypanosoma brucei'').
[http://genomebiology.com/2004/6/1/R2 "Computational prediction of human metabolic pathways from the complete human genome"],
Genome Biology 6:R2 R2.1-17 2004</ref> The remaining five databases include, EcoCyc (''E. coli''),<ref name="BioCyc4">Keseler, I.M., Collado-Vides, J., Santos-Zavaleta, A., Peralta-Gil, M., Gama-Castro, S., Muniz-Rascado, L., Bonavides-Martinez, C., Paley, S., Krummenacker, M., Altman, T., Kaipa, P., Spaulding, A., Pacheco, J., Latendresse, M., Fulcher, C., Sarker, M., Shearer, A.G., Mackie, A., Paulsen, I., Gunsalus, R.P., and Karp, P.D.
[http://nar.oxfordjournals.org/content/39/suppl_1/D583.long "EcoCyc: a comprehensive database of Escherichia coli biology"]
Nucleic Acids Research 39:D583-590 2011</ref> AraCyc (''Arabidopsis thaliana''), YeastCyc (''Saccharomyces cerevisiae''), LeishCyc (''Leishmania major Friedlin'') and TrypanoCyc (''Trypanosoma brucei'').


'''Tier 2:''' Databases which are computationally predicted by PathoLogic but have received moderate manual curation (most with 1–4 months curation). Tier 2 Databases are available for manual curation by scientists who are interested in any particular organism. Tier 2 databases currently contain 36 different organism databases.
'''Tier 2:''' Databases which are computationally predicted by PathoLogic but have received moderate manual curation (most with 1–4 months curation). Tier 2 Databases are available for manual curation by scientists who are interested in any particular organism. Tier 2 databases currently contain 36 different organism databases.
Line 23: Line 17:
<big>'''Pathway Tools Software:'''</big>
<big>'''Pathway Tools Software:'''</big>


The Pathway Tools is a comprehensive systems biology software that allows:<ref name = "BioCyc5">P.D. Karp, S.M. Paley, M. Krummenacker, et al.
The Pathway Tools is a comprehensive systems biology software that allows:<ref name = "BioCyc5">{{Cite journal|pmid=19955237}}</ref><ref name = "BioCyc6">{{Cite journal|pmid=12169551}}</ref><ref name = "BioCyc7">{{Cite journal|pmid=8697237}}</ref>
"Pathway Tools version 13.0: Integrated Software for Pathway/Genome Informatics and Systems Biology"
Briefings in Bioinformatics 11:40-79 2010</ref><ref name = "BioCyc6">Peter D. Karp, Suzanne Paley and Pedro Romero.
"The pathway tools software"
Bioinformatics Vol. 18 Suppl. 1 2002</ref><ref name = "BioCyc7">P. Karp and S. Paley.
"Integrated access to metabolic and genomic data"
Journal of Computational Biology, 3(1):191-212 1996</ref>
* Development of Organism specific databases
* Development of Organism specific databases
* Scientific Visualization, web publishing, and dissemination of organism-specific databases
* Scientific Visualization, web publishing, and dissemination of organism-specific databases
Line 38: Line 26:
* Analysis of biological networks
* Analysis of biological networks


The Pathway Tools software had four main components:<ref name = "BioCyc8">M. Krummenacker, S. Paley, L. Mueller, T. Yan, and P.D. Karp.
The Pathway Tools software had four main components:<ref name = "BioCyc8">{{Cite journal|pmc=1450015}}</ref>
"Querying and Computing with BioCyc Databases"
Bioinformatics 21:3454-5 2005</ref>
# PathoLogic: Algorithm that takes Genbank entry as input and creates the new PGDB containing the predicted metabolic pathways of an organism.
# PathoLogic: Algorithm that takes Genbank entry as input and creates the new PGDB containing the predicted metabolic pathways of an organism.
# Pathway/Genome Navigator: Allows query, visualization, and analysis of PGDBs.
# Pathway/Genome Navigator: Allows query, visualization, and analysis of PGDBs.
Line 56: Line 42:
'''AlgaGEM'''
'''AlgaGEM'''


AlgaGEM is a genome scale metabolic network model for a compartmentalized algae cell developed by Gomes de Oliveira Dal’Molin et al.<ref name = "BioCyc9">Cristiana Gomes de Oliveira Dal’Molin, Lake-Ee Quek, Robin W Palfreyman, Lars K Nielsen.
AlgaGEM is a genome scale metabolic network model for a compartmentalized algae cell developed by Gomes de Oliveira Dal’Molin et al.<ref name = "BioCyc9">{{Cite journal|pmid=22369158}}</ref> based on Chlamydomonas reinhardtii genome. It has 866 unique ORFs, 1862 metabolites, 2499 gene-enzyme-reaction-association entries, and 1725 unique reactions. One of the Pathway databases used for reconstruction is MetaCyc.
"AlgaGEM – a genome-scale metabolic
reconstruction of algae based on the
Chlamydomonas reinhardtii genome"
BMC Genomics 12(Suppl 4):S5 2011</ref> based on Chlamydomonas reinhardtii genome. It has 866 unique ORFs, 1862 metabolites, 2499 gene-enzyme-reaction-association entries, and 1725 unique reactions. One of the Pathway databases used for reconstruction is MetaCyc.


'''SNPs'''
'''SNPs'''


The study by Shimul Chowdhury et al.<ref name = "BioCyc10">{{cite journal|pmid=23059056}}</ref> showed association differed between maternal SNPs and metabolites involved in homocysteine, folate, and transsulfuration pathways in cases with Congenital Heart Defects (CHDs) as opposed to controls. The study used HumanCyc to select candidate genes and SNPs.
The study by Shimul Chowdhury et al.<ref name = "BioCyc10">Shimul Chowdhury, Charlotte A. Hobbs, Stewart L. MacLeod, Mario A. Cleves, Stepan
Melnyk, S. Jill James, Ping Hu, and Stephen W. Erickson.
"Associations between maternal genotypes and metabolites
implicated in congenital heart defects"
Molecular Genetics and Metabolism 107(3): 596–604 2012</ref> showed association differed between maternal SNPs and metabolites involved in homocysteine, folate, and transsulfuration pathways in cases with Congenital Heart Defects (CHDs) as opposed to controls. The study used HumanCyc to select candidate genes and SNPs.


== References ==
== References ==

Revision as of 21:36, 27 April 2015

The BioCyc database collection is an assortment of organism specific Pathway/ Genome Databases (PGDBs). They provide reference to genome and metabolic pathways of few thousand organisms.[1] As of June 23, 2014, there are 3563 databases within BioCyc. The list of databases can be found here. SRI International, based in Menlo Park, California, maintains the BioCyc database family.

Categories of Databases within BioCyc:

Based on the manual curation done, BioCyc database family is divided into 3 tiers:

Tier 1: Databases which have received at least one year literature based manual curation. Currently there are seven databases in Tier 1. Out of the seven, MetaCyc is a major database that contains metabolic pathways for 2063 organisms.[1][2] The other important Tier 1 database is HumanCyc which contains around 250 metabolic pathways found in humans.[3] The remaining five databases include, EcoCyc (E. coli),[4] AraCyc (Arabidopsis thaliana), YeastCyc (Saccharomyces cerevisiae), LeishCyc (Leishmania major Friedlin) and TrypanoCyc (Trypanosoma brucei).

Tier 2: Databases which are computationally predicted by PathoLogic but have received moderate manual curation (most with 1–4 months curation). Tier 2 Databases are available for manual curation by scientists who are interested in any particular organism. Tier 2 databases currently contain 36 different organism databases.

Tier 3: Databases which are computationally predicted by PathoLogic and received no manual curation. As with Tier 2, Tier 3 databases are also available for curation for interested scientists.

Software Tools within BioCyc:

The BioCyc website contains a variety of software tools for searching, visualizing, comparing, and analyzing genome and pathway information. It includes a genome browser, and browsers for metabolic and regulatory networks. The website also includes tools for painting large-scale ("omics") datasets onto metabolic and regulatory networks, and onto the genome.

Pathway Tools Software:

The Pathway Tools is a comprehensive systems biology software that allows:[5][6][7]

  • Development of Organism specific databases
  • Scientific Visualization, web publishing, and dissemination of organism-specific databases
  • Development of metabolic-flux models
  • Visual analysis of omics datasets
  • Computational inferences
  • Comparative analyses of organism-specific databases
  • Analysis of biological networks

The Pathway Tools software had four main components:[8]

  1. PathoLogic: Algorithm that takes Genbank entry as input and creates the new PGDB containing the predicted metabolic pathways of an organism.
  2. Pathway/Genome Navigator: Allows query, visualization, and analysis of PGDBs.
  3. MetaFlux: Allows development of metabolic flux models.
  4. Pathway/Genome Editors: Provides interactive editing capabilities for PGDBs.

BioCyc databases rely on a software system called Pathway Tools for their initial generation, subsequent updating, and for querying their content. The databases can also be installed locally.

All BioCyc databases share the same database schema, which facilitates comparisons across the databases.

Use of BioCyc Database Collection in Research:

Since BioCyc Database family comprises a long list of organism specific databases and also data at different systems level in a living system, the usage in research has been in a wide variety of context. Here, two studies are highlighted which show two different varieties of uses, one on a genome scale and other on identifying specific SNPs (Single Nucleotide Polymorphisms) within a genome.

AlgaGEM

AlgaGEM is a genome scale metabolic network model for a compartmentalized algae cell developed by Gomes de Oliveira Dal’Molin et al.[9] based on Chlamydomonas reinhardtii genome. It has 866 unique ORFs, 1862 metabolites, 2499 gene-enzyme-reaction-association entries, and 1725 unique reactions. One of the Pathway databases used for reconstruction is MetaCyc.

SNPs

The study by Shimul Chowdhury et al.[10] showed association differed between maternal SNPs and metabolites involved in homocysteine, folate, and transsulfuration pathways in cases with Congenital Heart Defects (CHDs) as opposed to controls. The study used HumanCyc to select candidate genes and SNPs.

References

  1. ^ a b . doi:10.1093/nar/gkr1014. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  2. ^ . doi:10.1007/s00204-011-0705-2. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  3. ^ . doi:10.1186/gb-2004-6-1-r2. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)CS1 maint: unflagged free DOI (link)
  4. ^ . doi:10.1093/nar/gkq1143. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  5. ^ . PMID 19955237. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  6. ^ . PMID 12169551. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  7. ^ . PMID 8697237. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  8. ^ . PMC 1450015 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1450015. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  9. ^ . PMID 22369158. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)
  10. ^ . PMID 23059056. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)

External links