Jump to content

Linked data

From Wikipedia, the free encyclopedia
A network of over a thousand circles clustered into groups and linked with lines into a web.
Wikidata in the Linked Open Data Cloud. Databases indicated as circles (with wikidata indicated as ‘WD’), with grey lines linking databases in the network if their data is aligned. Generated from https://lod-cloud.net/datasets .
DBpedia as the most interlinked LOD dataset and crystallization point of the Linked Open Data Cloud since 2008
DBpedia as the most interlinked LOD dataset and crystallization point of the Linked Open Data Cloud since 2008, image from 2021, generated from https://lod-cloud.net.

In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database.[1]

Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term in a 2006 design note about the Semantic Web project.[2]

Linked data may also be open data, in which case it is usually described as Linked Open Data.[3]

Principles

[edit]

In his 2006 "Linked Data" note, Tim Berners-Lee outlined four principles of linked data, paraphrased along the following lines:[2]

  1. Uniform Resource Identifiers (URIs) should be used to name and identify individual things.
  2. HTTP URIs should be used to allow these things to be looked up, interpreted, and subsequently "dereferenced".
  3. Useful information about what a name identifies should be provided through open standards such as RDF, SPARQL, etc.
  4. When publishing data on the Web, other things should be referred to using their HTTP URI-based names.

Tim Berners-Lee later restated these principles at a 2009 TED conference, again paraphrased along the following lines:[4]

  1. All conceptual things should have a name starting with HTTP.
  2. Looking up an HTTP name should return useful data about the thing in question in a standard format.
  3. Anything else that that same thing has a relationship with through its data should also be given a name beginning with HTTP.

Components

[edit]

Thus, we can identify the following components as essential to a global Linked Data system as envisioned, and to any actual Linked Data subset within it:

Linked open data

[edit]

Linked open data are linked data that are open data.[5][6][7] Tim Berners-Lee gives the clearest definition of linked open data as differentiated from linked data.

Linked Open Data (LOD) is Linked Data which is released under an open license, which does not impede its reuse for free.

— Tim Berners-Lee, Linked Data[2][8]

Large linked open data sets include DBpedia, Wikibase, Wikidata and Open ICEcat [uk; nl].

5-star linked open data

[edit]
Deployment scheme for Linked Open Data[9]

In 2010, Tim Berners-Lee suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data:[10]

  • 1 star: data is openly available in some format.
  • 2 stars: data is available in a structured format, such as Microsoft Excel file format (.xls).
  • 3 stars: data is available in a non-proprietary structured format, such as Comma-separated values (.csv).
  • 4 stars: data follows W3C standards, like using RDF and employing URIs.
  • 5 stars: all of the others, plus links to other Linked Open Data sources.

History

[edit]

The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list[11] was created.[12] The mailing list was initially hosted by the SIMILE project[13] at the Massachusetts Institute of Technology.

Linking Open Data community project

[edit]
The above diagram shows which Linking Open Data datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which was started in 2007. Some sets may include copyrighted data which is freely available.[14]
The same diagram as above, but for February 2017, showing the growth in just two and a half years

The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDF triples, which were interlinked by over two million RDF links.[15][16] By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014.[17]

European Union projects

[edit]

There are a number of European Union projects involving linked data. These include the linked open data around the clock (LATC) project,[18] the AKN4EU project for machine-readable legislative data,[19] the PlanetData project,[20] the DaPaaS (Data-and-Platform-as-a-Service) project,[21] and the Linked Open Data 2 (LOD2) project.[22][23][24] Data linking is one of the main goals of the EU Open Data Portal, which makes available thousands of datasets for anyone to reuse and link.

Ontologies

[edit]

Ontologies are formal descriptions of data structures. Some of the better known ontologies are:

  • FOAF – an ontology describing persons, their properties and relationships
  • UMBEL – a lightweight reference structure of 20,000 subject concept classes and their relationships derived from OpenCyc, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia and YAGO

Datasets

[edit]
  • DBpedia – a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billion triples, including abstracts in 11 different languages
  • GeoNames – provides RDF descriptions of more than 7,500,000 geographical features worldwide
  • Wikidata – a collaboratively-created linked dataset that acts as central storage for the structured data of its Wikimedia Foundation sibling projects
  • Global Research Identifier Database (GRID) – an international database of 89,506 institutions engaged in academic research, with 14,401 relationships. GRID models two types of relationships: a parent-child relationship that defines a subordinate association, and a related relationship that describes other associations[25][26]
  • KnowWhereGraph[27] – an integrated 12 billion triples strong knowledge graph of 30 data layers at the intersection between humans and their environment using Semantic Web and Linked Data technologies.[28]
  • Open ICEcat [uk; nl] - a multilingual open catalogue containing product datasheets, related digital assets and usage statistics.

Dataset instance and class relationships

[edit]

Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as by the figures to the right) are available.[29][30]

See also

[edit]

References

[edit]
  1. ^ "Linked Data as JSON". Linked Data as JSON. Retrieved 2020-12-04.
  2. ^ a b c Tim Berners-Lee (2006-07-27). "Linked Data". Design Issues. W3C. Retrieved 2010-12-18.
  3. ^ "What are Linked Data and Linked Open Data?". Ontotext. Retrieved 2019-05-08.
  4. ^ "Tim Berners-Lee on the next Web". Archived from the original on 2011-04-10. Retrieved 2009-03-15.
  5. ^ "Frequently Asked Questions (FAQs) - Linked Data - Connect Distributed Data across the Web". Archived from the original on 2015-11-18. Retrieved 2014-12-29.
  6. ^ "COAR » 7 things you should know about…Linked Data". Archived from the original on 2015-11-18. Retrieved 2015-12-29.
  7. ^ "Linked Data Basics for Techies". Archived from the original on 2021-05-05. Retrieved 2015-12-29.
  8. ^ "5 Star Open Data".
  9. ^ "5-star Open Data". 5stardata.info. Retrieved 2021-03-07.
  10. ^ "What is 5 Star Linked Data? | Webize Everything Community Group". www.w3.org. Retrieved 2021-03-07.
  11. ^ "public-lod@w3.org Mail Archives".
  12. ^ "SweoIG/TaskForces/CommunityProjects/LinkingOpenData/NewsArchive".
  13. ^ "SIMILE Project - Mailing Lists".
  14. ^ Linking open data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
  15. ^ "SweoIG/TaskForces/CommunityProjects/LinkingOpenData - W3C Wiki". esw.w3.org. Retrieved 22 March 2018.
  16. ^ Fensel, Dieter; Facca, Federico Michele; Simperl, Elena; Ioan, Toma (2011). Semantic Web Services. Springer. p. 99. ISBN 978-3642191923.
  17. ^ Max. "State of the LOD Cloud". linkeddatacatalog.dws.informatik.uni-mannheim.de. Retrieved 22 March 2018.
  18. ^ "Linked open data around the clock (LATC)". latc-project.eu. Archived from the original on 19 September 2018. Retrieved 22 March 2018.
  19. ^ Flatt, Amelie; Langner, Arne; Leps, Olof (2022), "Model-Driven Development of AKN Application Profiles: Background and Requirements", Model-Driven Development of Akoma Ntoso Application Profiles, Cham: Springer International Publishing, pp. 5–12, doi:10.1007/978-3-031-14132-4_2, ISBN 978-3-031-14131-7, retrieved 2023-01-07
  20. ^ "Welcome to PlanetData! - PlanetData". planet-data.eu. Archived from the original on 21 April 2021. Retrieved 22 March 2018.
  21. ^ "DaPaaS". project.dapaas.eu. Archived from the original on 18 December 2020. Retrieved 22 March 2018.
  22. ^ Linking Open Data 2 (LOD2)
  23. ^ "CORDIS FP7 ICT Projects – LOD2". European Commission. 2010-04-20.
  24. ^ "LOD2 Project Fact Sheet – Project Summary" (PDF). 2010-09-01. Archived from the original (PDF) on 2011-07-20. Retrieved 2010-12-18.
  25. ^ "GRID Statistics". grid.ac/stats. Retrieved 2018-10-26.
  26. ^ "GRID Policies". grid.ac. Retrieved 2018-10-26.
  27. ^ "KnowWhereGraph". knowwheregraph.org. Retrieved 2022-05-16.
  28. ^ Krzysztof Janowicz; Pascal Hitzler; Wenwen Li; Dean Rehberger; Mark Schildhauer; Rui Zhu; Cogan Shimizu; Colby K. Fisher; Ling Cai; Gengchen Mai; Joseph Zalewski; Lu Zhou; Shirly Stephen; Seila Gonzalez Estrecha; Bryce D. Mecum; Anna Lopez-Carr; Andrew Schroeder; Dave Smith; Dawn J. Wright; Sizhe Wang; Yuanyuan Tian; Zilong Liu; Meilin Shi; Anthony D'Onofrio; Zhining G; Kitty Currier (2022). "Know, Know Where, Knowwheregraph: A Densely Connected, Cross-Domain Knowledge Graph and Geo-Enrichment Service Stack for Applications in Environmental Intelligence". AI Magazine. 43 (1): 30–39. doi:10.1609/aimag.v43i1.19120. hdl:1983/be176aba-9dec-456c-9615-01a0e8556b7b.
  29. ^ "Instance relationships amongst datasets". fu-berlin.de. Archived from the original on 2012-10-17. Retrieved 22 March 2018.
  30. ^ "Class relationships amongst datasets". Archived from the original on 28 August 2011. Retrieved 22 March 2018.

Further reading

[edit]
[edit]