||It has been suggested that Cyberinfrastructure be merged into this article. (Discuss) Proposed since September 2011.|
E-Science (or eScience) is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." These outputs were outlined by the many considerations for the preservation and access to the results of federally funded scientific research by The White House's Office and Science Technology Policy in February 2013, in which some but not all of the aforementioned e-Science output products were slated for preservation and access requirements under the memorandum's directive. E-Sciences include particle physics, earth sciences and social simulations. Particle physics has a well-developed e-Science infrastructure in particular because of its need for adequate computing facilities for the analysis of results and storage of data originating from the CERN Large Hadron Collider, which started taking data in 2009. E-science encompasses "what is often referred to as “big data” [which] has revolutionized science...As of 2013, the Large Hadron Collider (LHC) at CERN...generates around 780 terabytes per year, [and] the Sloan Digital Sky Survey...recently released 60 terabytes,...[and other] highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, and genomics."
Characteristics and examples
Most of the research activities into e-Science have focused on the development of new computational tools and infrastructures to support scientific discovery. Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments. Currently[when?] there is a large focus in e-Science in the United Kingdom, where the UK e-Science programme provides significant funding. In Europe the development of computing capabilities to support the CERN Large Hadron Collider has led to the development of e-Science and Grid infrastructures which are also used by other disciplines.
Example e-Science infrastructures include the Worldwide LHC Computing Grid, a federation with various partners including the European Grid Infrastructure, the Open Science Grid and the Nordic DataGrid Facility.
To support e-Science applications, Open Science Grid combines interfaces to more than 100 nation-wide clusters, 50 interfaces to geographically distributed storage caches, and 8 campus grids (Purdue, Wisconsin-Madison, Clemson, Nebraska-Lincoln, FermiGrid at FNAL, SUNY-Buffalo, and Oklahoma in the United States; and UNESP in Brazil). Areas of science benefiting from Open Science Grid include:
- Astrophysics, Gravitational Physics, High-energy Physics, Neutrino Physics, and Nuclear Physics.
- Structural Biology, Computational Biology, Genomics, Proteomics, and Medicine.
- Molecular Dynamics, Materials Science and Engineering, Computer Science and Engineering, and Nanotechnology.
After his appointment as Director General of the Research Councils in 1999 John Taylor, with the support of the Science Minister David Sainsbury and the Chancellor of the Exchequer Gordon Brown, bid to HM Treasury to fund a programme of e-infrastructure development for science which would provide the foundation for UK science and industry to be a world leader in the knowledge economy which motivated the Lisbon Strategy for sustainable economic growth that the UK government committed to in March 2000.
In November 2000 John Taylor announced £98 million for a national UK e-Science programme. An additional £20 million contribution was planned from UK industry in matching funds to projects that they participated in. From this budget of £120 million over three years, £75 million was to be spent on grid application pilots in all areas of science, administered by the Research Council responsible for each area, while £35 million was to be administered by the EPSRC as a Core Programme to develop "industrial strength" Grid middleware. Phase 2 of the programme for 2004-2006 was supported by a further £96 million for application projects, and £27 million for the EPSRC core programme. Phase 3 of the programme for 2007-2009 was supported by a further £14 million for the EPSRC core programme and a further sum for applications. Additional funding for UK e-Science activities was provided from European Union funding, from university funding council SRIF funding for hardware, and from Jisc for networking and other infrastructure.
The UK e-Science programme comprised a wide range of resources, centres and people including the National e-Science Centre (NeSC) which is managed by the Universities of Glasgow and Edinburgh, with facilities in both cities. Tony Hey led the core programme from 2001 to 2005.
Within the UK regional e-Science centres support their local universities and projects, including:
- White Rose Grid e-Science Centre (WRGeSC)
- Belfast e-Science Centre (BeSC)
- Cambridge e-Science Centre (CeSC)
- STFC e-Science Centre (STFCeSC)
- e-Science North West (eSNW)
- National Grid Service (NGS)
- Lancaster University Centre for e-Science
- London e-Science Centre (LeSC)
- North East Regional e-Science Centre (NEReSC)
- Oxford e-Science Centre (OeSC)
- Southampton e-Science Centre (SeSC)
- Welsh e-Science Centre (WeSC)
- Midlands e-Science Centre (MeSC)
There are also various centres of excellence and research centres.
In addition to centres, the grid application pilot projects were funded by the Research Council responsible for each area of UK science funding.
The EPSRC funded 11 pilot e-Science projects in three phases (for about £3 million each in the first phase):
- First Phase (2001–2005) were CombEchem, DAME, Discovery Net, GEODISE, myGrid and RealityGrid.
- Second phase (2004–2008) were GOLD and Integrative biology
- Third phase (2005–2010) were PMSEG (MESSAGE), CARMEN and NanoCMOS
The PPARC/STFC funded two projects: GridPP (phase 1 for £17 million, phase 2 for £5.9 million, phase 3 for £30 million and a 4th phase running from 2011 to 2014) and Astrogrid (£14 million over 3 phases).
The remaining £23 million of phase one funding was divided between the application projects funded by BBSRC, MRC and NERC:
- BBSRC: Biomolecular Grid, Proteome Annotation Pipeline, High-Throughput Structural Biology, Global Biodiversity
- MRC: Biology of Ageing, Sequence and Structure Data, Molecular Genetics, Cancer Management, Clinical e-Science Framework, Neuroinformatics Modeling Tools
- NERC: Climateprediction.com, Oceanographic Grid, Molecular Environmental Grid, NERC DataGrid
The funded UK e-Science programme was reviewed on its completion in 2009 by an international panel led by Daniel Atkins, director of the Office of Cyberinfrastructure of the US NSF. The report concluded that the programme had developed a skilled pool of expertise, some services, and had led to cooperation between academia and industry, but that these achievements were at a project level rather than by generating infrastructure or transforming disciplines to adopt e-Science as a normal method of work, and that they were not self-sustainable without further investment.
US-based initiatives, where the term cyberinfrastructure is typically used to define e-Science projects, are primarily funded by the National Science Foundation office of cyberinfrastructure (NSF OCI) and Department of Energy (in particular the Office of Science).
Traditional Science vs. e-Science
Traditional science is representative of two distinct philosophical traditions within the history of science but e-Science, it is being argued requires a paradigm shift, and the addition of a third branch of the sciences. "The idea of open data is not a new one; indeed, when studying the history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism, transparency, and reproducibility for independent verification in scholarly publishing in the 1660s. The scientific method later was divided into two major branches, deductive and empirical approaches...Today, a theoretical revision in the scientific method should include a new branch, Victoria Stodden advocate[s], that of the computational approach, where like the other two methods, all of the computational steps by which scientists draw conclusions are revealed. This is because within the last 20 years, people have been grappling with how to handle changes in high performance computing and simulation." Conceptually, e-Science revolves around developing new methods to support scientists in conducting scientific research with the aim of making new scientific discoveries by analyzing vast amounts of data accessible over the internet using vast amounts of computational resources. However, discoveries of value cannot be made simply by providing computational tools, a cyberinfrastructure or by performing a pre-defined set of steps to produce a result. Rather, there needs to be an original, creative aspect to the activity that by its nature cannot be automated. This has led to various research that attempts to define the properties that e-Science platforms should provide in order to support a new paradigm of doing science, and new rules to fulfill the requirements of preserving and making computational data results available in a manner such that they are reproducible in traceable, logical steps, as an intrinsic requirement for the maintenance of modern scientific integrity that allows an extenuation of "Boyle's tradition in the computational age."
Modelling e-Science Processes
One view  argues that since a modern discovery process instance serves a similar purpose to a mathematical proof it should have similar properties, namely it allows results to be deterministically reproduced when re-executed and that intermediate results can be viewed to aid examination and comprehension. In this case, simply modelling the provenance of data is not sufficient. One has to model the provenance of the hypotheses and results generated from analyzing the data as well so as to provide evidence that support new discoveries.Scientific workflows have thus been proposed and developed to assist scientists to track the evolution of their data, intermediate results and final results as a means to document and track the evolution of discoveries within a piece of scientific research.
Other views include Science 2.0 where e-Science is considered to be a shift from the publication of final results by well-defined collaborative groups towards a more open approach, which includes the public sharing of raw data, preliminary experimental results, and related information. To facilitate this shift, the Science 2.0 view is on providing tools that simplify communication, cooperation and collaboration between interested parties. Such an approach has the potential to: speed up the process of scientific discovery; overcome problems associated with academic publishing and peer review; and remove time and cost barriers limiting the process of generating new knowledge.
- e-Social Science
- Grid computing
- Distributed computing
- Citizen science
- e-Science librarianship
- Scientific workflow system
- Science 2.0
- Bohle, S. "What is E-science and How Should it Be Managed?" Nature.com, Spektrum der Wissenschaft (Scientific American), http://www.scilogs.com/scientific_and_medical_libraries/what-is-e-science-and-how-should-it-be-managed/.
- Executive Office of the President, Office of Science and Technology Policy, "Memorandum for the Heads of Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research." February 22, 2013, accessed July 7, 2013, http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf.
- "National e-Science Centre". official website. Retrieved 29 September 2011.
- Richard Poynder (12 December 2006). "A Conversation with Microsoft's Tony Hey". Open and Shut? blog. Retrieved 20 September 2011. "It just happens that in the US they chose another name. Personally, I think e-Science is a much better name than cyberinfrastructure." Full transcript updated 15 December 2006.
- "Office of Cyberinfrastructure (OCI)". Retrieved 19 September 2011.
- Syed, J.; Ghanem, M.; Guo, Y. (2007). "Supporting scientific discovery processes in Discovery Net". Concurrency and Computation: Practice and Experience 19 (2): 167. doi:10.1002/cpe.1049.
- DOE and NSF Open Science Grid
- European Grid Infrastructure
- Nordic DataGrid Facility
- The eScience Institute at the University of Washington
- The Dutch Virtual Laboratory for e-science (VL-e) project
- UK Research Council's e-Science program
- UK National e-Science Centre
- UK National Centre for e-Social Science and their Wiki on e-Social Science
- Large Hadron Collider
- Worldwide LHC Computing Grid
- NSF TeraGrid Project
- Arts and Humanities E-Science Support Centre (AHESSC)
- OMII-UK (formerly the 'Open Middleware Infrastructure Institute UK'
- E-Science and Data Services Collaborative (EDSC)
- The European Commission's e-Infrastructures activity
- Swedish e-Science Research Centre