Jump to content

Data Commons

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by DanBri (talk | contribs) at 20:07, 14 October 2020 (datacommons.org). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


datacommons.org

datacommons.org is an open knowledge repository hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph.

The datacommons.org project was an outcome of the Open Knowledge Network initiative. [1] The project's site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network ("Fact Checks". datacommons.org. 29 March 2019. Retrieved 14 October 2020.). The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019[2]. In 2020 the service improved its coverage of non-US datasets, and expanded to cover Bioinformatics and Coronavirus topics in more depth. [3]

Features

The emphasis of datacommons.org is more focused on statistical data than is common for Linked Data and Knowledge Graph initiatives. It centers on the entity-oriented integration of statistical observations from a variety of public datasets. As such, although it supports a subset of the W3C SPARQL query language (https://docs.datacommons.org/api/python/query.html), its APIs (https://docs.datacommons.org/api/) also include tools - such as a Pandas dataframe interface - oriented towards data science, statistics and data visualization.

The most important feature of datacommons.org is that it is integrative. Rather than providing a hosting platform for diverse datasets, it also attempts to consolidate much of the information the datasets provide into a single data graph.


Technology

The datacommons.org approach is built on a graph data-model. The graph can be accessed through several APIs, and is expanded through loading data (typically CSV and MCF-based templates). [4]. The data vocabulary used to define the datacommons.org graph is based upon Schema.org. In particular the schema.org terms http://schema.org/StatisticalPopulation and https://schema.org/Observation were proposed to Schema.org to support datacommons-like usecases. [5]

Software from the project is available on Github under Apache 2 license. [6]



Category:Google Category:Open_data Category:Knowledge_graphs

  1. ^ "Open Knowledge Network - Enabling the Community to Build the Network". 4 October 2017.
  2. ^ "Doing our part to share open data responsibly". The Keyword. Google. Retrieved 14 October 2020.
  3. ^ Ramasubramanian, Sowmya (21 September 2020). "Google's open source data to study impact of COVID-19". The Hindu. Retrieved 14 October 2020.
  4. ^ "Contributing to Data Commons - Adding datasets". datacommons.org. Data Commons.
  5. ^ "Proposal for representing Aggregate Statistical Data". Github - Schema.org repository. 25 June 2019. Retrieved 14 October 2020.
  6. ^ "datacommons.org github".