Bioinformatics workflow management system: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Line 2: Line 2:
A '''bioinformatics workflow management system''' is a specialized form of [[workflow management system]] designed specifically to compose and execute a series of computational or data manipulation steps, or a [[workflows|workflow]], that relate to [[bioinformatics]].
A '''bioinformatics workflow management system''' is a specialized form of [[workflow management system]] designed specifically to compose and execute a series of computational or data manipulation steps, or a [[workflows|workflow]], that relate to [[bioinformatics]].


There are currently many different workflow systems. Some have been developed more generally as [[scientific workflow system]]s for use by scientists from many different disciplines like [[astronomy]] and [[earth science]]. All such systems are based on an abstract representation of how a computation proceeds in the form of a directed graph, where each node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks.
There are currently many different workflow systems. Some have been developed more generally as [[scientific workflow system]]s for use by scientists from many different disciplines like [[astronomy]] and [[earth science]]. All such systems are based on an abstract representation of how a computation proceeds in the form of a directed graph, where each node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks. Each system typically provides visual front-end allowing the user to build and modify complex applications with little or no programming expertise.<ref>{{cite doi|10.1002/cpe.993|noedit}}</ref><ref>{{cite doi|10.1145/1084805.1084814|noedit}}</ref><ref>{{cite doi|10.1109/CIBEC.2008.4786077|noedit}}</ref>
Each system typically provides visual front-end allowing the user to build and modify complex applications with little or no programming expertise.


==Examples==
==Examples==
In alphabetical order, some examples of bioinformatics workflow management systems include:
* [[BioExtract]]: a web-based system for querying biomolecular sequence data, executing analytic tools on the resulting extracts, and constructing workflows composed of such queries and tools.
* [[Anduril (workflow engine)|Anduril]] bioinformatics and image analysis
* [[Anduril (workflow engine)|Anduril]] bioinformatics and image analysis
* Anvaya: Anvaya is a software application consisting of interface to Bioinformatics tools and databases in a workflow environment, to execute the set of analyses tools in series or in parallel.<ref>{{cite pmid| 22809419|noedit}}</ref>
* [[BioBIKE]]
* [[BioExtract]]: a web-based system for querying biomolecular sequence data, executing analytic tools on the resulting extracts, and constructing workflows composed of such queries and tools.
* [[UGENE]] provides a workflow management system that is installed on a local computer
* [[BioBIKE]]: a Web-based, programmable, integrated biological knowledge base<ref>{{cite pmid|19433511|noedit}}</ref>
* [http://chipster.csc.fi/ Chipster]
* [[UGENE]] provides a workflow management system that is installed on a local computer<ref>{{cite pmid|22368248|noedit}}</ref>
* Chipster: a user-friendly analysis software for microarray data<ref name=chipster>{{cite pmid| 21999641|noedit}}</ref>
* [[Discovery Net]]: one of the earliest examples of a scientific workflow system, later commercialized as InforSense which was then acquired by IDBS.
* [[Discovery Net]]: one of the earliest examples of a scientific workflow system, later commercialized as InforSense which was then acquired by IDBS.
* [[Galaxy (computational biology)|Galaxy]]: initially targeted at [[genomics]]
* [[Galaxy (computational biology)|Galaxy]]: initially targeted at [[genomics]]<ref>{{cite pmid| 20738864|noedit}}</ref>
* [http://www.geneprof.org/ GeneProf]: web based functional genomics experiments, e.g. RNA-seq or ChIP-seq
* GeneProf: web based functional genomics experiments, e.g. RNA-seq or ChIP-seq<ref>{{cite pmid| 22205509|noedit}}</ref>
* [[KNIME]] the Konstanz Information Miner<ref>{{cite doi|10.1016/j.compbiolchem.2007.08.009|noedit}}</ref>
* [[OnlineHPC]] Online workflow designer based on [[Taverna workbench|Taverna]]
* [[OnlineHPC]] Online workflow designer based on [[Taverna workbench|Taverna]]
* [http://www.tavaxy.org Tavaxy]:<ref>{{cite doi|10.1186/1471-2105-13-77}}</ref> A cloud-based bioinformatics workflow system that integrates features from both Taverna and Galaxy for NGS data analysis.
* Tavaxy<ref>{{cite doi|10.1186/1471-2105-13-77|noedit}}</ref> A cloud-based bioinformatics workflow system that integrates features from both Taverna and Galaxy for NGS data analysis.
* [[Taverna workbench]]:<ref>{{cite doi|10.1093/bioinformatics/bth361}}</ref> an early e-Science system widely used in bioinformatics
* [[Taverna workbench]]:<ref>{{cite doi|10.1093/bioinformatics/bth361|noedit}}</ref><ref>{{cite doi|10.1093/nar/gkl320|noedit}}</ref> an early domain-independent system widely used in bioinformatics and other areas of [[e-Science]]
* [[VisTrails]]
* [[VisTrails]]<ref>{{cite doi|10.1109/VISUAL.2005.1532788|noedit}}</ref>
* [http://bioinfo.cdac.in/products-anvaya.xhtml Anvaya]: Anvaya is a software application consisting of interface to Bioinformatics tools and databases in a workflow environment, to execute the set of analyses tools in series or in parallel. One of the unique features of Anvaya is the rules engine that defines rules for logical connection between the existing tools. Anvaya offers the user, novel functionality to carry out exhaustive comparative analysis via custom tools which are tools with new functionality not available in standard tools and built-in PERL parsers.


==Comparisons between workflow systems==
==Comparisons between workflow systems==
With a large number of bioinformatics workflow systems to chose from, it becomes difficult to understand and compare the features of the different workflow systems. There has been little work conducted in evaluating and comparing the systems from a bioinformatician's perspective, especially when it comes to comparing the data types they can deal with, the in-built functionalities that are provided to the user or even their performance or usability. Examples of existing comparisons include
With a large number of bioinformatics workflow systems to chose from, it becomes difficult to understand and compare the features of the different workflow systems. There has been little work conducted in evaluating and comparing the systems from a bioinformatician's perspective, especially when it comes to comparing the data types they can deal with, the in-built functionalities that are provided to the user or even their performance or usability. Examples of existing comparisons include


* The paper "Scientific workflow systems-can one size fit all?",<ref>{{Cite doi|10.1109/CIBEC.2008.4786077|noedit}}</ref> which provides a high-level framework for comparing workflow systems based on their control flow and data flow properties. The systems compared include [[Discovery Net]], [[Taverna workbench|Taverna]], Triana, [[Kepler scientific workflow system|Kepler]] as well as Yawl and [[Business Process Execution Language|BPEL]].
* The paper "Scientific workflow systems-can one size fit all?",<ref>
{{Citation
| last1 = Curcin | first1 = V
| last2 = Ghanem| first2 = M
| title = Scientific workflow systems - can one size fit all?
| series = Biomedical Engineering Conference, 2008. CIBEC 2008
| year = 2008
| publisher = IEEE
| doi = 10.1109/CIBEC.2008.4786077
| url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4786077
}}
</ref> which provides a high-level framework for comparing workflow systems based on their control flow and data flow properties. The systems compared include [[Discovery Net]], [[Taverna workbench|Taverna]], Triana, [[Kepler scientific workflow system|Kepler]] as well as Yawl and [[Business Process Execution Language|BPEL]].


* The paper "Meta-workflows: pattern-based interoperability between Galaxy and Taverna" <ref>
* The paper "Meta-workflows: pattern-based interoperability between Galaxy and Taverna" <ref>
{{Cite doi|10.1145/1833398.1833400|noedit}}</ref> which provides a more user-oriented comparison between [[Taverna workbench|Taverna]] and [[Galaxy (computational biology)|Galaxy]] in the context of enabling interoperability between both systems.
{{Citation
| last1 = Abouelhoda | first1 = M
| last2 = Ghanem| first2 = M
| last3 = Alaa| first3 = S
| title = Meta-workflows: pattern-based interoperability between Galaxy and Taverna
| series = Wands '10 Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
| year = 2010
| publisher = ACM
| doi = 10.1145/1833398.1833400
}}
</ref> which provides a more user-oriented comparison between [[Taverna workbench|Taverna]] and [[Galaxy (computational biology)|Galaxy]] in the context of enabling interoperability between both systems.


* The infrastructure paper "Delivering ICT Infrastructure for Biomedical Research" <ref>
* The infrastructure paper "Delivering ICT Infrastructure for Biomedical Research" <ref>
Line 60: Line 40:
| doi = 10.1145/2361999.2362006
| doi = 10.1145/2361999.2362006
}}
}}
</ref> compares two workflow systems, [[Anduril (workflow engine)|Anduril]] and [http://chipster.csc.fi/ Chipster], in terms of infrastructure requirements in a cloud-delivery model.
</ref> compares two workflow systems, [[Anduril (workflow engine)|Anduril]] and Chipster<ref name=chipster/>, in terms of infrastructure requirements in a cloud-delivery model.


== See also ==
* [[Workflow engine]]
* [[Workflow application]]


==References==
==References==
{{Reflist}}
{{Reflist}}



==External links==
* {{cite doi|10.1002/cpe.993}}
* {{cite doi|10.1145/1084805.1084814}} from the ACM [[SIGMOD]] Record
* {{cite doi|10.1109/CIBEC.2008.4786077}} paper in CIBEC'08 comparing multiple workflow systems for bioinformatics applications


{{DEFAULTSORT:Bioinformatics workflow management systems}}
{{DEFAULTSORT:Bioinformatics workflow management systems}}

Revision as of 08:05, 14 August 2014

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

There are currently many different workflow systems. Some have been developed more generally as scientific workflow systems for use by scientists from many different disciplines like astronomy and earth science. All such systems are based on an abstract representation of how a computation proceeds in the form of a directed graph, where each node represents a task to be executed and edges represent either data flow or execution dependencies between different tasks. Each system typically provides visual front-end allowing the user to build and modify complex applications with little or no programming expertise.[1][2][3]

Examples

In alphabetical order, some examples of bioinformatics workflow management systems include:

  • Anduril bioinformatics and image analysis
  • Anvaya: Anvaya is a software application consisting of interface to Bioinformatics tools and databases in a workflow environment, to execute the set of analyses tools in series or in parallel.[4]
  • BioExtract: a web-based system for querying biomolecular sequence data, executing analytic tools on the resulting extracts, and constructing workflows composed of such queries and tools.
  • BioBIKE: a Web-based, programmable, integrated biological knowledge base[5]
  • UGENE provides a workflow management system that is installed on a local computer[6]
  • Chipster: a user-friendly analysis software for microarray data[7]
  • Discovery Net: one of the earliest examples of a scientific workflow system, later commercialized as InforSense which was then acquired by IDBS.
  • Galaxy: initially targeted at genomics[8]
  • GeneProf: web based functional genomics experiments, e.g. RNA-seq or ChIP-seq[9]
  • KNIME the Konstanz Information Miner[10]
  • OnlineHPC Online workflow designer based on Taverna
  • Tavaxy[11] A cloud-based bioinformatics workflow system that integrates features from both Taverna and Galaxy for NGS data analysis.
  • Taverna workbench:[12][13] an early domain-independent system widely used in bioinformatics and other areas of e-Science
  • VisTrails[14]

Comparisons between workflow systems

With a large number of bioinformatics workflow systems to chose from, it becomes difficult to understand and compare the features of the different workflow systems. There has been little work conducted in evaluating and comparing the systems from a bioinformatician's perspective, especially when it comes to comparing the data types they can deal with, the in-built functionalities that are provided to the user or even their performance or usability. Examples of existing comparisons include

  • The paper "Scientific workflow systems-can one size fit all?",[15] which provides a high-level framework for comparing workflow systems based on their control flow and data flow properties. The systems compared include Discovery Net, Taverna, Triana, Kepler as well as Yawl and BPEL.
  • The paper "Meta-workflows: pattern-based interoperability between Galaxy and Taverna" [16] which provides a more user-oriented comparison between Taverna and Galaxy in the context of enabling interoperability between both systems.
  • The infrastructure paper "Delivering ICT Infrastructure for Biomedical Research" [17] compares two workflow systems, Anduril and Chipster[7], in terms of infrastructure requirements in a cloud-delivery model.


References

  1. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1002/cpe.993, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1002/cpe.993 instead.
  2. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1084805.1084814, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1084805.1084814 instead.
  3. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/CIBEC.2008.4786077, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/CIBEC.2008.4786077 instead.
  4. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 22809419, please use {{cite journal}} with |pmid= 22809419 instead.
  5. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 19433511, please use {{cite journal}} with |pmid=19433511 instead.
  6. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 22368248, please use {{cite journal}} with |pmid=22368248 instead.
  7. ^ a b Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 21999641, please use {{cite journal}} with |pmid= 21999641 instead.
  8. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 20738864, please use {{cite journal}} with |pmid= 20738864 instead.
  9. ^ Attention: This template ({{cite pmid}}) is deprecated. To cite the publication identified by PMID 22205509, please use {{cite journal}} with |pmid= 22205509 instead.
  10. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/j.compbiolchem.2007.08.009, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/j.compbiolchem.2007.08.009 instead.
  11. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1186/1471-2105-13-77, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1186/1471-2105-13-77 instead.
  12. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1093/bioinformatics/bth361, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1093/bioinformatics/bth361 instead.
  13. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1093/nar/gkl320, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1093/nar/gkl320 instead.
  14. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/VISUAL.2005.1532788, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/VISUAL.2005.1532788 instead.
  15. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/CIBEC.2008.4786077, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/CIBEC.2008.4786077 instead.
  16. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1833398.1833400, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1833398.1833400 instead.
  17. ^ Nyrönen, TH; Laitinen, J; et al. (2012), Delivering ICT infrastructure for biomedical research, Proceedings of the WICSA/ECSA 2012 Companion Volume (WICSA/ECSA '12), ACM, doi:10.1145/2361999.2362006 {{citation}}: Explicit use of et al. in: |first2= (help)