Hybrid Data Infrastructure

From Wikipedia, the free encyclopedia
Jump to: navigation, search

An Hybrid Data Infrastructure is a new type of Data Infrastructure specifically conceived to deal with data-intensive science[1] (see also e-Science). In such a domain space, (potentially large-scale) datasets come in all forms and shapes from huge international experiments to cross-laboratory, single laboratory, or even from a multitude of individual observations. The management and processing of such datasets is beyond the capacity of traditional technological approaches based on local, specialized data facilities. Such data are characterized by the well known three V’s:[2] (i) Volume – data dimension in terms of bytes is huge, (ii) Velocity – data collection, processing and consumption is demanding in terms of speed, and (iii) Variety – data heterogeneity, in terms of data types and data sources requiring integration, is high.

An Hybrid Data Infrastructure[3] is an innovative approach based on the assumption that several technologies, including Grid, private and public Cloud, can be integrated to provide an elastic access and usage of data and data-management capabilities. Moreover, it must be equipped with a rich array of mediator services for interfacing with existing data sources and repositories. Overall, its goal is to enable a data-management-capability delivery model in which computing, storage, data and software are made available by the infrastructure as-a-Service. It might be equipped with a service supporting the dynamic creation of Virtual Research Environments, which conceptually can be seen as applications tailored to serve a specific need whose constituents are acquired by the HDI.

Related Projects[edit]


  1. ^ T Hey, S. Tansley, and K. Tolle (Eds). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. 2009 http://research.microsoft.com/en-us/collaboration/fourthparadigm/
  2. ^ L. K. Stapleton. Taming Big Data. IBM Data Management Magazine, 16(2):12-18, 2011.
  3. ^ L. Candela, D. Castelli and P. Pagano. Managing Big Data through Hybrid Data Infrastructures. ERCIM News, Issue 89, April 2012 http://ercim-news.ercim.eu/en89/special/managing-big-data-through-hybrid-data-infrastructures