The SHIWA (SHaring Interoperable Workflows for large-scale scientific simulations on Available DCIs) is a project led by the LPDS (Laboratory of Parallel and Distributed Systems) of MTA Computer and Automation Research Institute. The project coordinator is Prof. Dr. Peter Kacsuk, it started on 1 July 2010 and lasted two years. SHIWA (project number 261585) was supported by a Grant from the European Commission's FP7 INFRASTRUCTURES-2010-2 call under grant agreement n°261585.
The SHIWA project developed and deployed the SHIWA Simulation Platform (SSP) to enable infrastructure and workflow interoperability at two levels:
- coarse-grained interoperability, referring to the nesting of different workflow systems in order to achieve execution frameworks interoperability; and
- fine-grained interoperability, referring to the transformation workflow representations in order to achieve workflows migration from one system to another.
After the project ended the SHIWA Technology was overtaken by the ER-flow support action project to provide sustainability and to extend the user community base.
Background and motivations
Scientists of all disciplines have invested a tremendous effort in the exploitation of Distributed Computing Infrastructures (DCIs) for their ability to support compute-intensive in-silico experiments and virtual organizations. Many DCIs with large user communities have emerged during the last decade, such as the Distributed European Infrastructure for Supercomputing Applications (DEISA) [Niederberger and Mextorf 2005], EGEE Grid (Enabling Grids for e-Science) [EGEE n.d.], the German D-Grid initiative (D-Grid) [Gentzsch 2006], UK National Grid Service (NGS) [NGS n.d.] and the North American TeraGrid (TG) [TeraGrid n.d.]. They are based on different middleware stacks that provide an abstraction layer between computer resources and applications. For example NGS and TeraGrid are built on the Globus Toolkit [Foster 2006], EGEE on gLite [gLite n.d.], DEISA relies on both the Globus Toolkit and Unicore [Erwin and Snelling 2002], while D-Grid is executed under gLite, the Globus Toolkit and Unicore. In Europe, this momentum is climaxing in 2010 with the emergence of the European Grid Initiative (EGI) that will federate all major European organizations related to distributed-computing and National Grid Initiatives (NGIs). In its effort to create the next generation of pan-Europe DCI, EGI will face unprecedented challenges related to the heterogeneity of national grid infrastructures, resources and operating middleware. Production DCIs are commonly built on a large number of components, such as data resources, metadata catalogues, authentication and authorisation methods, and software repositories. Managing the execution of applications on DCIs is consequently a complex task. Moreover, solutions developed for one particular Grid are difficult to port to other infrastructures. In order to shield this complexity from researchers and to facilitate the design of in-silico experiments, workflow systems are widely used as a virtualization layer on top of the underlying infrastructures. They have become essential to integrate expertise about both the application (user domain) and the DCI (infrastructure domain) in order to optimize and support research of the scientific computing community. In the current multi-DCI panorama, users need to access different infrastructures in order to enlarge and widen the variety of resources usable, as well as share and reuse domain specific resources. Interoperability among DCIs is hardly achieved at the middleware level though. SHIWA considers the EGI production infrastructure a major DCI of great interest for the European Scientists to design and simulate experiments in-silico. It directly addresses the challenges related to (i) scientific experiments design through simulation workflows description and (ii) middleware heterogeneities encountered among the many existing DCIs through workflow interoperability techniques.
Concepts and project objectives
SHIWA aimed at improving the experience of Virtual Research Communities which are heavily using DCIs for their scientific experimentation. With the recent multiplication of efforts dedicated to e-infrastructures, scientific simulation can now benefit from the availability of massive computing and data storage facilities to sustain multi-disciplinary scientific challenges. As a side effect a variety of non-interoperable technologies coexist to enable the exploitation of computing infrastructures for in-silico experiments. In Europe, this momentum is climaxing with the emergence of the EGI that will federate all major European organizations related to distributed computing and NGIs. Consequently European research on simulation is currently hampered by several interoperability issues that reduce its efficiency by limiting knowledge and expertise sharing among scientific communities. SHIWA was designed as a user-centred project aiming at lowering barriers among scientific communities by providing services tackling interoperability issues. In particular, SHIWA' work program focuses on improving the efficiency of workflow-based in-silico experiments by targeting the following three objectives:
- Objective 1: develop workflows and expertise sharing among Virtual Research Communities.
- Objective 2: enable cross-system management of simulation workflows in Scientific Gateways.
- Objective 3: support Virtual Research Communities in their design and realization of in-silico experiments.
- Objective 4: improve interoperability among DCIs.
- Objective 5: simplify access to multiple DCIs for Virtual Research Communities.
- Objective 6: promote the use of European e-Infrastructure among simulation users from various disciplines.
Workflow interoperability enables the execution of workflows of different workflow systems that may span multiple heterogeneous infrastructures (DCIs). It can facilitate application migration due to infrastructure, services and workflow system evolution. Workflow interoperability allows workflow sharing to support and foster the adoption of common research methodologies, improve efficiency and reliability of research by reusing these common methodologies, increase the lifetime of workflows and reduction of development time for new workflows. Interoperability among workflow systems does not only permit the development and enactment of large-scale and comprehensive workflows, but also reduces the existing gap between different DCIs, and consequently promotes cooperation among research communities exploiting these DCIs. As workflow systems enable researchers to build comprehensive workflow applications for DCIs, the project consortium identified workflow interoperability as the most promising approach to bridge the existing gaps among DCIs. Workflow and DCI interoperability is of paramount importance to advance the quality and impact of scientific applications that target DCIs which enables advanced features previously not available:
- Enabling exploitation of specific features of workflow systems considering applications’ requirements and DCIs’ capabilities.
- Sharing workflows published by research communities to support collaboration, reuse of validated methodologies and knowledge transfer.
- Running workflow applications on multiple heterogeneous DCIs.
- Facilitating workflow-based application migration maintenance.
- Optimizing experiments by using the most appropriate workflow system and/or DCIs.
SHIWA developed workflow interoperability solutions for several workflow systems, namely ASKALON [Fahringer, et al. 2005], MOTEUR [Glatard, et al. 2008], Pegasus [Deelman 2005], PGRADE [Kacsuk, et al. 2003], Galaxy, GWES, Kepler, LONI Pipeline, Taverna, ProActive and Triana [Majithia et al. 2004]. In so doing, it will provide access to Grids built on gLite and Globus middleware to create production-level services to run workflow-based large-scale simulations. The targeted middleware and workflow systems are depicted by components with bold borders in Figure 1.1.1. The project will use existing Grid middleware interoperability solutions enabling access to gLite and Globus based Grids such as the Austrian Grid, D-Grid, EGEE and NGS. The project consortium will also consider support for the EMI-supported Nordugrid Advanced Resource Connector (ARC) [M.Ellert 2007] and Unicore
- Computer and Automation Research Institute, Hungarian Academy of Sciences
- University of Innsbruck
- Charité - Universitätsmedizin Berlin
- French National Centre for Scientific Research
- University of Westminster
- Cardiff University
- Academic Medical Centre of the University of Amsterdam
- University of Southern California
- ActiveEon SAS
- MAAT France
- Correlation Systems Ltd
- ETH Zurich, Institute of Molecular Systems Biology
- National Research Council, Institute for Biomedical Technlogies