Jump to content

Data Commons: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
DanBri (talk | contribs)
mNo edit summary
DanBri (talk | contribs)
Tag: Reverted
Line 8: Line 8:
'''datacommons.org''' is an open knowledge repository hosted by [[Google]] that provides a unified view across multiple public datasets, combining economic, scientific and other [[Open_data|open datasets]] into an integrated data graph.
'''datacommons.org''' is an open knowledge repository hosted by [[Google]] that provides a unified view across multiple public datasets, combining economic, scientific and other [[Open_data|open datasets]] into an integrated data graph.


The datacommons.org site was launched in May 2018 with an initial dataset consisting of fact-checking data published in [[Schema.org]] "ClaimReview" format by several fact checkers from the [[Poynter_Institute#International_Fact-Checking_Network|International Fact-Checking Network]] ({{cite web |url=http://www.datacommons.org/factcheck/
The datacommons.org project was an outcome of the Open Knowledge Network initiative. <ref>{{cite web |url=https://www.nitrd.gov/nitrdgroups/index.php?title=Open_Knowledge_Network |title=Open Knowledge Network - Enabling the Community to Build the Network|date=4 October 2017}}</ref> The project's site was launched in May 2018 with an initial dataset consisting of fact-checking data published in [[Schema.org]] "ClaimReview" format by several fact checkers from the [[Poynter_Institute#International_Fact-Checking_Network|International Fact-Checking Network]] ({{cite web |url=http://www.datacommons.org/factcheck/
|title=Fact Checks |date=29 March 2019 |website=datacommons.org |access-date=14 October 2020}}). The service expanded during 2019 to include an [[Resource_Description_Framework|RDF-style]] [[Knowledge_graph|Knowledge Graph]] populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019<ref>{{cite web |title=Doing our part to share open data responsibly |url=https://www.blog.google/technology/ai/sharing-open-data/ |website=The Keyword |publisher=Google |accessdate=14 October 2020}}</ref>. In 2020 the service improved its coverage of non-US datasets, and expanded to cover [[Bioinformatics]] and [[Coronavirus_disease_2019|Coronavirus]] topics in more depth. <ref>{{cite news |last=Ramasubramanian |first=Sowmya |date=21 September 2020 |title=Google’s open source data to study impact of COVID-19 |url=https://www.thehindu.com/sci-tech/technology/googles-open-source-data-to-study-impact-of-covid-19/article32660642.ece |work=[[The Hindu]] | accessdate=14 October 2020}}</ref>
|title=Fact Checks |date=29 March 2019 |website=datacommons.org |access-date=14 October 2020}}). The service expanded during 2019 to include an [[Resource_Description_Framework|RDF-style]] [[Knowledge_graph|Knowledge Graph]] populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019<ref>{{cite web |title=Doing our part to share open data responsibly |url=https://www.blog.google/technology/ai/sharing-open-data/ |website=The Keyword |publisher=Google |accessdate=14 October 2020}}</ref>. In 2020 the service improved its coverage of non-US datasets, and expanded to cover [[Bioinformatics]] and [[Coronavirus_disease_2019|Coronavirus]] topics in more depth. <ref>{{cite news |last=Ramasubramanian |first=Sowmya |date=21 September 2020 |title=Google’s open source data to study impact of COVID-19 |url=https://www.thehindu.com/sci-tech/technology/googles-open-source-data-to-study-impact-of-covid-19/article32660642.ece |work=[[The Hindu]] | accessdate=14 October 2020}}</ref>



Revision as of 20:07, 14 October 2020


datacommons.org

datacommons.org is an open knowledge repository hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph.

The datacommons.org project was an outcome of the Open Knowledge Network initiative. [1] The project's site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network ("Fact Checks". datacommons.org. 29 March 2019. Retrieved 14 October 2020.). The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019[2]. In 2020 the service improved its coverage of non-US datasets, and expanded to cover Bioinformatics and Coronavirus topics in more depth. [3]

Features

The emphasis of datacommons.org is more focused on statistical data than is common for Linked Data and Knowledge Graph initiatives. It centers on the entity-oriented integration of statistical observations from a variety of public datasets. As such, although it supports a subset of the W3C SPARQL query language (https://docs.datacommons.org/api/python/query.html), its APIs (https://docs.datacommons.org/api/) also include tools - such as a Pandas dataframe interface - oriented towards data science, statistics and data visualization.

The most important feature of datacommons.org is that it is integrative. Rather than providing a hosting platform for diverse datasets, it also attempts to consolidate much of the information the datasets provide into a single data graph.


Technology

The datacommons.org approach is built on a graph data-model. The graph can be accessed through several APIs, and is expanded through loading data (typically CSV and MCF-based templates). [4]. The data vocabulary used to define the datacommons.org graph is based upon Schema.org. In particular the schema.org terms http://schema.org/StatisticalPopulation and https://schema.org/Observation were proposed to Schema.org to support datacommons-like usecases. [5]

Software from the project is available on Github under Apache 2 license. [6]



Category:Google Category:Open_data Category:Knowledge_graphs

  1. ^ "Open Knowledge Network - Enabling the Community to Build the Network". 4 October 2017.
  2. ^ "Doing our part to share open data responsibly". The Keyword. Google. Retrieved 14 October 2020.
  3. ^ Ramasubramanian, Sowmya (21 September 2020). "Google's open source data to study impact of COVID-19". The Hindu. Retrieved 14 October 2020.
  4. ^ "Contributing to Data Commons - Adding datasets". datacommons.org. Data Commons.
  5. ^ "Proposal for representing Aggregate Statistical Data". Github - Schema.org repository. 25 June 2019. Retrieved 14 October 2020.
  6. ^ "datacommons.org github".