Data hub

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A data hub (data management system, or DMS) is software for collaborating on gathering, sharing and using data.[1]

The term is usually used to refer to the new web-based generation of such products. They can be either platforms for handling lots of different kinds of data, or in verticals specialising in one particular field.

Features[edit]

At core, a DMS is a list of datasets that are of diverse schema.

Once you have that, people expect the following features, and/or tight integration with tools that provide them:[2]

  • Load and update data from any source (ETL)
  • Store datasets and index them for querying
  • View, analyze and update data in a tabular interface (spreadsheet)
  • Visualise data, for example with charts or maps
  • Analyze data, for example with statistics and machine learning
  • Organize many people to enter or correct data (crowd-sourcing)
  • Measure and ensure the quality of data, and its provenance
  • Permissions; data can be open, private or shared
  • Find datasets, and organize them to help others find them
  • Sell data, sharing processing costs between users

List of data hubs[edit]

It's considered that a desktop operating system (e.g. Unix, OSX, Windows) is the legacy DMS that we use at the moment to do the things that would be better done by a good DMS.[2]

References[edit]

  1. ^ "Data Hubs, Data Management Systems and CKAN | OKFN Notebook". Open Knowledge Foundation. 2011-04-27. Retrieved 2012-03-08. 
  2. ^ a b c d e "From CMS to DMS: C is for Content, D is for Data". ScraperWiki. 2012-03-09. Retrieved 2012-03-12.