Open data: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Giovanisp (talk | contribs)
Tidied some references using Template:Cite doi
Line 4: Line 4:
[[File:Open Data stickers.jpg|thumb|Clear labeling of the licensing terms is a key component of Open data, and icons like the one pictured here are being used for that purpose.]]
[[File:Open Data stickers.jpg|thumb|Clear labeling of the licensing terms is a key component of Open data, and icons like the one pictured here are being used for that purpose.]]


'''Open data''' is the idea that certain [[data]] should be freely available to everyone to use and republish as they wish, without restrictions from [[copyright]], [[patent]]s or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as [[open source]], [[open hardware]], [[open content]], and [[Open access (publishing)|open access]]. The philosophy behind open data has been long established (for example in the [[Merton thesis|Mertonian tradition of science]]), but the term "open data" itself is recent, gaining popularity with the rise of the [[Internet]] and [[World Wide Web]] and, especially, with the launch of open-data government initiatives such as [[Data.gov]].
'''Open data''' is the idea that certain [[data]] should be freely available to everyone to use and republish as they wish, without restrictions from [[copyright]], [[patent]]s or other mechanisms of control.<ref>{{cite doi|10.1007/978-3-540-76298-0_52}}</ref> The goals of the open data movement are similar to those of other "Open" movements such as [[open source]], [[open hardware]], [[open content]], and [[Open access (publishing)|open access]]. The philosophy behind open data has been long established (for example in the [[Merton thesis|Mertonian tradition of science]]), but the term "open data" itself is recent, gaining popularity with the rise of the [[Internet]] and [[World Wide Web]] and, especially, with the launch of open-data government initiatives such as [[Data.gov]].


== Overview ==
== Overview ==
Line 53: Line 53:
* [https://opendata.go.ke/ opendata.go.ke] - Kenyan government open-data website. Launched in Jul 2011.
* [https://opendata.go.ke/ opendata.go.ke] - Kenyan government open-data website. Launched in Jul 2011.
* [http://data.overheid.nl/ data.overheid.nl] - Dutch government open-data website. Launched in Oct 2011.
* [http://data.overheid.nl/ data.overheid.nl] - Dutch government open-data website. Launched in Oct 2011.
* [http://www.rotterdamopendata.nl] - Rotterdam municipal open-data website, Launched in Aug 2012.
* [http://www.rotterdamopendata.nl rotterdamopendata.nl] - Rotterdam municipal open-data website, Launched in Aug 2012.
* [http://datos.gob.cl/ datos.gob.cl] - Chilean government open-data website. Launched in Sept 2011.
* [http://datos.gob.cl/ datos.gob.cl] - Chilean government open-data website. Launched in Sept 2011.
* [http://data.gov.it data.gov.it] - Italian government open-data website. Launched in October 2011.<ref>{{cite news|title=Wikitalia ovvero la partecipazione civica dopo e oltre i referendum|url=http://www.libertiamo.it/2011/10/21/wikitalia-ovvero-la-partecipazione-civica-dopo-e-oltre-i-referendum/|accessdate=7 November 2011|newspaper=Libertiamo|date=21 October 2011}}</ref>
* [http://data.gov.it data.gov.it] - Italian government open-data website. Launched in October 2011.<ref>{{cite news|title=Wikitalia ovvero la partecipazione civica dopo e oltre i referendum|url=http://www.libertiamo.it/2011/10/21/wikitalia-ovvero-la-partecipazione-civica-dopo-e-oltre-i-referendum/|accessdate=7 November 2011|newspaper=Libertiamo|date=21 October 2011}}</ref>
Line 80: Line 80:
Arguments made on behalf of Open Data include the following:
Arguments made on behalf of Open Data include the following:


* "Data belong to the [[human race]]". Typical examples are [[genome]]s, data on organisms, medical science, [[environmental data]].<ref>http://en.wikipedia.org/wiki/Aarhus_Convention</ref>
* "Data belong to the [[human race]]". Typical examples are [[genome]]s, data on organisms, medical science, [[environmental data]] following the [[Aarhus Convention]]
* [[Public money]] was used to fund the work and so it should be universally available.<ref>[http://www.publictechnology.net/sector/central-gov/dispatch-box-road-open-data On the road to open data, by Ian Manocha]</ref>
* [[Public money]] was used to fund the work and so it should be universally available.<ref>[http://www.publictechnology.net/sector/central-gov/dispatch-box-road-open-data On the road to open data, by Ian Manocha]</ref>
* It was created by or at a government institution (this is common in US National Laboratories and government agencies)
* It was created by or at a government institution (this is common in US National Laboratories and government agencies)
Line 112: Line 112:
* [[Open content]] is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
* [[Open content]] is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
* [[Open notebook science]] refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.<ref>http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html creation of term</ref>
* [[Open notebook science]] refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.<ref>http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html creation of term</ref>
* [[Open research]]/[[Open science]]/[[Open science data]] ([[Linked open science]]) means an approach to open and interconnect scientific assets like data, methods and tools with [[Linked Data]] techniques to enable transparent, reproducible and transdisciplinary research.<ref name="kauppinen">{{cite doi|10.1016/j.procs.2011.04.076}}</ref>
* [[Open research]]/[[Open science]]/[[Open science data]] (Linked open science<ref name="linkedopenscience") means an approach to open and interconnect scientific assets like data, methods and tools with [[Linked Data]] techniques to enable transparent, reproducible and transdisciplinary research.<ref name="kauppinen">{{cite doi|10.1016/j.procs.2011.04.076}}</ref>
* [[Open knowledge]]. The [[Open Knowledge Foundation]] argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the [[Open Knowledge Definition]], which is alluded to in [[Science Commons]]' Protocol for Implementing Open Access Data.<ref>[http://sciencecommons.org/projects/publishing/open-access-data-protocol/ Protocol for Implementing Open Access Data]</ref>
* [[Open knowledge]]. The [[Open Knowledge Foundation]] argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in [[Science Commons]]' Protocol for Implementing Open Access Data.<ref>[http://sciencecommons.org/projects/publishing/open-access-data-protocol/ Protocol for Implementing Open Access Data]</ref>
* [[Open source]] (software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.
* [[Open source]] (software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.


Line 138: Line 138:
* aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. [[Directive on the legal protection of databases]])
* aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. [[Directive on the legal protection of databases]])
* time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
* time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
* [[webstacle]]s, or the provision of single [[data point]]s as opposed to [[tabular]] [[Information retrieval|queries]] or bulk [[download]]s of [[data set]]s.
* webstacles, or the provision of single [[data point]]s as opposed to [[tabular]] [[Information retrieval|queries]] or bulk [[download]]s of [[data set]]s.
* political, commercial or legal pressure on the activity of organisations providing Open Data (for example the [[American Chemical Society]] lobbied the US Congress to limit funding to the [[National Institutes of Health]] for its Open [[PubChem]] data.<ref name="uoc">[http://osc.universityofcalifornia.edu/news/acs_pubchem.html Review of history and positions by the University of California]
* political, commercial or legal pressure on the activity of organisations providing Open Data (for example the [[American Chemical Society]] lobbied the US Congress to limit funding to the [[National Institutes of Health]] for its Open [[PubChem]] data.<ref name="uoc">[http://osc.universityofcalifornia.edu/news/acs_pubchem.html Review of history and positions by the University of California]
</ref>
</ref>
Line 145: Line 145:
* The [[Open Knowledge Foundation]]
* The [[Open Knowledge Foundation]]
* [[Scholarly Publishing and Academic Resources Coalition]]
* [[Scholarly Publishing and Academic Resources Coalition]]
* [[freeourdata.org.uk]] <ref>[http://www.freeourdata.org.uk/index.php "Free our data"] ([[The Guardian]] technology section)</ref>
* freeourdata.org.uk <ref>[http://www.freeourdata.org.uk/index.php "Free our data"] ([[The Guardian]] technology section)</ref>
* [[Open Data Institute]]
* [[Open Data Institute]]
* [[Sunlight Foundation]]
* [[Sunlight Foundation]]
* [[LinkedScience.org]]<ref>[http://linkedscience.org/about]</ref>
* LinkedScience.org<ref>http://linkedscience.org/about</ref>
* [[Talis Group|Talis]]
* [[Talis Group|Talis]]
* [[w3.org]] <ref>[http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Linking Open Data on the Semantic Web]</ref>
* [[w3.org]] <ref>[http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Linking Open Data on the Semantic Web]</ref>

Revision as of 15:14, 25 March 2013

File:Linked-open-data-Europeana-video.ogv
An introductory overview of Linked Open Data in the context of cultural institutions.
Linking Open Data project in September 2007
Linked Open Data Cloud in September 2011
Clear labeling of the licensing terms is a key component of Open data, and icons like the one pictured here are being used for that purpose.

Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.[1] The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov.

Overview

The concept of open data is not new; but a formalized definition is relatively new—the primary such formalization being that in the Open Definition which can be summarized in the statement that "A piece of data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike."[2]

Open data is often focused on non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.

A typical depiction of the need for open data:

Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge

[3] John Wilbanks, VP Science, Creative Commons

Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE to aggregate said data, protect it with copyright and then resell it. The issue of indigenous knowlegde poses a great challenge in terms of capturing, storage and distribution. Many socities in third world countries lack the technicality processes of managing the IK.

Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:

  • I want my data back. (Jon Bosak circa 1997)
  • I've long believed that customers of any application own the data they enter into it..[4] (This quote refers to Veen's own heart-rate data.)

Major sources of open data

Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.

Open data in science

The concept of open access to scientific data was institutionally established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957-1958.[5] The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.[6]

While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.

In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly funded archive data should be made publicly available.[7] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.[8]

Examples of open data in science:

Open data in government

Several national governments have created web sites to distribute a portion of the data they collect. It is a concept for a collaborative project in municipal Government to create and organize Culture for Open Data or Open government data. A list of over 200 local, regional and national open data catalogues is available on the open source datacatalogs.org project, which aims to be a comprehensive list of data catalogues from around the world. Prominent examples include:

  • Data.gov - U.S. government open-data website. Launched in May 2009.
  • Data.gov.uk - U.K. government open-data website. Launched in September 2009.
  • data.govt.nz - New Zealand Government initiative to publish Government Data under Creative Commons licences, defined further at NZ GOAL. launched in Nov 2009.
  • data.norge.no - Norwegian government open-data website. Launched in April 2010.
  • geodata.gov.gr - Greece's open government geospatial data Launched 21 July 2010, as a state initiative.[9]
  • opengovdata.ru - OpenGovData Russia Catalog. Launched in 2010, private initiative.[10]
  • Data.gov.au - Australian government open-data website. Launched in March 2011.
  • Data.gov.ma - Moroccan government open-data website. Launched in April 2011.
  • Data.gc.ca - Canadian government open-data website. Launched in March 2011.
  • data.belgium.be - Belgian government open-data website. Still in beta, but usable.
  • opendata.go.ke - Kenyan government open-data website. Launched in Jul 2011.
  • data.overheid.nl - Dutch government open-data website. Launched in Oct 2011.
  • rotterdamopendata.nl - Rotterdam municipal open-data website, Launched in Aug 2012.
  • datos.gob.cl - Chilean government open-data website. Launched in Sept 2011.
  • data.gov.it - Italian government open-data website. Launched in October 2011.[11]
  • datos.gob.es - Spanish government open-data website. Launched in October 2011.
  • datos.gub.uy - Uruguayan government open-data website. Launched in November 2011.
  • data.gouv.fr - French government open-data website. Launched in December 2011.
  • dati.gov.it - Italian government open-data website. Launched in October 2011.
  • dados.gov.br - Brazilian government open-data website. Launched in December 2011.
  • www.opendata.ee - Estonian government open-data website.
  • dados.gov.pt - Portuguese government open-data website.
  • date.gov.md - Moldavian government open-data website.
  • data.gov.in - India Government open-data website. Launched in 2012.
  • data.gv.at - Austrian Government open-data website.
  • daten-deutschland.de - German Government open-data website. Launched in February 2013.
  • open-data.europa.eu - European Commission Data Portal.
  • satupemerintah.net - Indonesian Government open-data website.

Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 31 U.S. states, 13 cities, and > 150 agencies and subagencies providing open data; e.g. the state of California, USA [1].

The United Nations has an open data website that publishes statistical data from Member States and UN Agencies: [2].

Arguments for and against open data

The debate on Open Data is still evolving. The best open government applications seek to empower consumers, to help small businesses, or to create value in some other positive, constructive way. Open government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.

Arguments made on behalf of Open Data include the following:

It is generally held that factual data cannot be copyrighted.[15] However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

Unlike Open Access, where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.

Arguments against making all data available as Open Data include the following:

  • Government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem).
  • Governments have to be accountable for the efficient use of taxpayer's money: If public funds are used to aggregate the data and if the data will bring commercial (private) benefits to only a small number of users, the users should reimburse governments for the cost of providing the data.
  • The revenue earned by publishing data permits non-profit organisations to fund other activities (e.g. learned society publishing supports the society).
  • The government gives specific legitimacy for certain organisations to recover costs (NIST in US, Ordnance Survey in UK).
  • Privacy concerns may require that access to data is limited to specific users or to sub-sets of the data.
  • Collecting, 'cleaning', managing and disseminating data are typically labour- and/or cost-intensive processes - whoever provides these services should receive fair remuneration for providing those services.
  • Sponsors do not get full value unless their data is used appropriately - sometimes this requires quality management, dissemination and branding efforts that can best be achieved by charging fees to users.
  • Often, targeted end-users cannot use the data without additional processing (analysis, apps etc.) - if anyone has access to the data, none may have an incentive to invest in the processing required to make data useful (Typical examples include biological, medical, and environmental data).

Relation to other open activities

The goals of the Open Data movement are similar to those of other "Open" movements.

  • Open access is concerned with making scholarly publications freely available on the internet. In some cases, these articles include open datasets as well.
  • Open content is concerned with making resources aimed at a human audience (such as prose, photos, or videos) freely available.
  • Open notebook science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.[16]
  • Open research/Open science/Open science data (Linked open scienceCite error: The <ref> tag has too many names (see the help page).
  • Open knowledge. The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.[17]
  • Open source (software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.

Funders' mandates

Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):[18]

  • to deposit bioinformatics, atomic and molecular coordinate data, experimental data into the appropriate public database immediately upon publication of research results.
  • to retain original data sets for a minimum of five years after the grant. This applies to all data, whether published or not.

Note the fundamental requirement to be able to replicate the experiment.

Other bodies active in promoting the deposition of data as well as fulltext include the Wellcome Trust.

Closed data

Several mechanisms restrict access to or reuse of data. They include:

  • making data available for a charge.
  • compilation in databases or websites to which only registered members or customers can have access.
  • use of a proprietary or closed technology or encryption which creates a barrier for access.
  • copyright forbidding (or obfuscating) re-use of the data.
  • license forbidding (or obfuscating) re-use of the data (such as share-alike[citation needed] or non-commercial)
  • patent forbidding re-use of the data (for example the 3-dimensional coordinates of some experimental protein structures have been patented)
  • restriction of robots to websites, with preference to certain search engines
  • aggregating factual data into "databases" which may be covered by "database rights" or "database directives" (e.g. Directive on the legal protection of databases)
  • time-limited access to resources such as e-journals (which on traditional print were available to the purchaser indefinitely)
  • webstacles, or the provision of single data points as opposed to tabular queries or bulk downloads of data sets.
  • political, commercial or legal pressure on the activity of organisations providing Open Data (for example the American Chemical Society lobbied the US Congress to limit funding to the National Institutes of Health for its Open PubChem data.[19]

Organisations promoting open data

See also

References

  1. ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1007/978-3-540-76298-0_52, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1007/978-3-540-76298-0_52 instead.
  2. ^ Seee Open Definition home page and the full Open Definition
  3. ^ Science Commons
  4. ^ Jeffrey Veen
  5. ^ Committee on Scientific Accomplishments of Earth Observations from Space, National Research Council (2008). Earth Observations from Space: The First 50 Years of Scientific Achievements. The National Academies Press. p. 6. ISBN 0-309-11095-5. Retrieved 2010-11-24.
  6. ^ World Data Center System (2009-09-18). "About the World Data Center System". NOAA, National Geophysical Data Center. Retrieved 2010-11-24.
  7. ^ OECD Declaration on Open Access to publicly funded data
  8. ^ OECD Principles and Guidelines for Access to Research Data from Public Funding
  9. ^ "Open Government Data Catalogues".
  10. ^ "Open Government Data Catalogues".
  11. ^ "Wikitalia ovvero la partecipazione civica dopo e oltre i referendum". Libertiamo. 21 October 2011. Retrieved 7 November 2011.
  12. ^ On the road to open data, by Ian Manocha
  13. ^ "Big Data for Development: From Information- to Knowledge Societies", Martin Hilbert (2013), SSRN Scholarly Paper No. ID 2205145). Rochester, NY: Social Science Research Network; http://papers.ssrn.com/abstract=2205145
  14. ^ How to Make the Dream Come True argues in one research area (Astronomy) that access to open data increases the rate of scientific discovery.
  15. ^ Towards a Science Commons includes an overview of the basis of Openness in science data.
  16. ^ http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html creation of term
  17. ^ Protocol for Implementing Open Access Data
  18. ^ SPARC-OpenData@arl.org Mailing List Archive
  19. ^ Review of history and positions by the University of California
  20. ^ "Free our data" (The Guardian technology section)
  21. ^ http://linkedscience.org/about
  22. ^ Linking Open Data on the Semantic Web

External links