Wikipedia:WikiProject Open Access/Signalling OA-ness

From Wikipedia, the free encyclopedia
Jump to: navigation, search
9-minute video explaining what open access is about

This page is about how Wikipedia pages could signal to readers whether a particular reference is open access or not, as outlined in this Signpost op-ed. The main purpose of such signalling would be to spare readers the disappointment of clicking through to the resource only to find out that they do not have access rights to read it. The scheme is also useful for Wikipedia editors who can see at a glance whether a given reference would be licensed in a way that allows for the images, media or even text to be reused in Wikipedia articles.

Project summary[edit]

Some automated tools which work with open access articles are already created. They impose nothing upon anyone who does not wish to use them. For those who wish to use them, they would automate some parts of the citation process and make an odd Wikipedia-specific citation which, contrary to academic tradition, notes whether a work is free to read rather than subscription only. The tools also rip everything usable out of open access works, including the text of the article, pictures or media used, and some metadata, then places this content in multiple Wikimedia projects including Wikimedia Commons, Wikisource, and Wikidata, as well as generating the citation on Wikipedia.

How this works is that when making a citation, someone uses the "Signalling OA" tool. Their citation is generated, but then also, if the paper is open access, then it is mirrored on Wikisource, its images are uploaded to Wikimedia Commons, and metadata about the paper goes to Wikidata. From the user perspective, they just made a citation, but with this tool, making a citation also can automatically trigger the collection of any content which is free to take.

In a nutshell[edit]

This image of Xanthichthys ringens is sourced from an open-access scholarly article licensed for re-use.
How can we make that reusability explicit when citing this source in Wikipedia articles?[1]
For further details, see this Signpost op-ed.

Citations used in Wikimedia projects should signal whether a person may actually read the work cited, rather than encounter a paywall. Furthermore, if the source actually is open access, then that source should be mirrored (by means of a robot which already exists and works) on Wikisource. This would have many effects in any all languages of Wikimedia project which uses citations. On English Wikipedia, it would allow (but not force) writers to make "open access-signalling" citations, track the use of citations, give readers access to the works being cited, and rip non-text media from open access sources and queue it for human review for automated upload into Commons. All of the tools to automate this work and are in place - see examples of ripped works on Wikisource. It just remains to get community support for actually adding the entirety of open access content to Wikisource, Wikimedia Commons, Wikidata, and the rest, and then to direct Wikipedia's citation system to this infrastructure. This inherently affects every language as it implies massive potential for translating and suggesting sources across languages. All of this could be ignored and forces no changes on anyone, but for people who would like to use any of these tools, they and a lot of mirrored content on Wikimedia projects would be available. To typical readers, the only obvious change they would notice is that citations would look like this when the writer chooses to use the Signaling OA tool suite.

  1. ^ Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". In Gratwicke, Brian. PLoS ONE 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760.  CC0 full text media metadata

Longer summary[edit]

"Open access" is a term to describe academic publications (research articles) which can be read and remixed for free. There are social movements especially since the early 2000s which have said that there should be more access to these publications. This "Signalling OA-ness" project might be the first and only viable proposal to catalog every academic publication which exists and provide access to every open access publication which exists.

The reason why this proposal is more viable than others is because the Wikimedia projects can make a claim competitive to any other for being the best platform for hosting a list of every article which has ever been published just because anyone can add content to Wikimedia projects and Wikimedia projects already have a userbase which has been doing this since the founding of the project. Wikimedia projects have the following characteristics which make them more likely to

  • No system exists anywhere in the world to catalog all academic papers and signal which ones are open access.
  • Wikimedia projects already have the world's largest base of volunteers citing academic papers in a single platform.
  • Wikimedia content on any given subject is almost always either the world's most popular content on that subject or among the most popular content. This is because Internet search directs people who ask for information to check Wikimedia projects.
  • Since Wikipedia guidelines say that information in Wikipedia should come from reliable sources, and since from Wikipedia's founding there is a recognition of the need to cite academic papers as the ideal reliable sources, Wikimedia projects have already established a need and culture of practice around citing academic papers.
  • MediaWiki software is an ideal technical platform for hosting a catalog of citations, the text of open access sources, and granting access of the same to anyone who wants it.
  • Wikimedia projects are probably the least expensive and most organizationally neutral place to develop any system for managing all citations while hosting and enriching all open access sources, and it can be practically done.
  • There is no identified major opposition to setting this up on a Wikimedia project - please comment if you can imagine a reason to oppose, it would be a favor to the organizers.
  • Because no standard exists, and because Wikimedia projects have the user base, popularity, need, and technical capacity to make that standard, and because it is practical to do it here, and because nothing opposes trying, then if a team wants to try to implement the system it is a project worthy of support.

The actual project works like this:

  1. Create a database listing every academic publication ever. This is easier than it sounds.
  2. Instead of keeping citations hosted locally on each Wikimedia project, put them into Wikidata. Suppose that one paper is cited in 10 Wikipedia articles, and that there are 200 language wikipedias. With current practices, this means that the citation would have to be recreated 2000 times, which is crazy inefficient. If it were on Wikidata then it could be created 1 time then called from the database each time.
  3. Simultaneously to this catalog being perpetually developed, if a paper cited on any Wikimedia project is an open access paper, then copy it to Wikisource. This includes put the text on Wikisource, the media on Wikimedia Commons, the citation on Wikidata, then the reference on Wikipedia or wherever it is used. See the example below to see how it looks, and especially look at the "text on Wikisource". That text is automatically generated by a bot and it looks good.
  4. The end result is that whenever any cites an academic paper in any Wikimedia project, like Wikipedia, then a bot checks out that paper. It puts a citation for it on Wikidata; checks to see whether the paper is open access; if the paper is open access then it copies the text and media automatically, and it sets up metrics for how the citation is used.

It is extremely motivating to authors, publishers, organizations, governments, and universities to know how their papers are cited, and very empowering to the Wikimeida community to be able to police the use of citations at the source rather than at the article level. This could generate a level of quality control that attracts new and expert users in new ways.

There are some example open access papers already placed on Wikisource and Wikimedia Commons as tests for the importer robot, and these can be seen at s:Wikisource:WikiProject Open Access/Programmatic import from PubMed Central. There are some bugs, but the papers are quite usable. The proposal is to do this kind of importing for everything that can be imported.

As further incentive, once these papers are imported they can be enriched in all kinds of ways by categorization, interlinking, or limitless other proposals.

Background information[edit]

Definition of Open Access[edit]

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

"Open access" is a social movement outside of Wikipedia. The Open Access Movement suggests that science is improved when scientists have access to research publications. From a Wikipedia perspective, open access is interesting because any Wikipedian can read and reuse open access publications, which is not true of typical publications.

Other places to read about efforts related to this project[edit]

  • Wikipedia:WikiProject Open This is a project showcasing the spectrum of open initiatives on Wikipedia, including open access, open educational resources, and open source software.
  • Commons:Commons:Open Access File of the Day This is a collection of open access non-text media, mostly images, archived to Wikimedia Commons and each used multiple times in other Wikimedia projects. Typically, these images illustrate Wikipedia articles.
  • en:Wikisource:Wikisource:WikiProject Open Access This project discusses the mirroring of open access publications on Wikisource, and what tools could be made to give Wikimedians elsewhere better access to these works.
    • en:Wikisource:Wikisource:WikiProject Open Access/Programmatic import from PubMed Central This is a demo collection of what this projects robots can do without human intervention. View any of these uploaded papers. The bot knew that the paper was open access, so it copied text from the paper to Wikisource, as well as automated the process of uploading non-text media used in the paper to Wikimedia Commons. There are mistakes in this process but it is a workable upload and most people would say it looked fine.

CC BY and CC0/ PD icons[edit]

This wording is essentially identical to that of the Creative Commons Attribution License (CC BY), which was drafted a little later. For this reason, the fact that some materials are open access according to the definition above could in principle be signaled by some of the icons used for CC BY:

Of course, materials available under CC0 or in the Public Domain would also meet the BOAI definition. Some relevant icons are:

NISO Workgroup on Open Access Metadata and Indicators[edit]

  • An initiative directed at standardizing the way licensing metadata is expressed is the NISO Workgroup on Open Access Metadata and Indicators (OAMI)
    • Draft recommendations of the Working Group have been released on January 6, 2014. They include
      • no definition of the term Open Access
      • a <free_to_read> tag intended to signal whether and when a publication is available publicly without a requirement for payment or registration
      • a <license_ref> tag intended to point to a URI containing the licensing terms
      • provisions for adding dates to both tags, so as to account for embargoes
      • no specification as to which licenses are allowed, or whether and how they should be version-controlled
      • no provision for icons that may be suitable for signaling the content of the proposed tags.
    • Comments are invited until February 4, and can be left here.

Jisc Workgroup on Vocabularies for Open Access[edit]

  • An initiative directed at standardizing the way metadata related to Open Access (not limited to licensing information) is expressed is the Jisc Workgroup on Vocabularies for Open Access (V4OA)
    • Draft recommendations of the Working Group have been released on October 2, 2013. They include
      • no definition of the term Open Access
      • a <readable> tag intended to signal whether a publication is available publicly without a requirement for payment or registration; in contrast to NISO, they do not consider delayed access here, as they handle embargoes differently
      • a tag intended to point to a URI containing the licensing terms, similar to NISO's <license_ref> tag but available in two forms (<rights-xml> and <rights-human-readable>) to cater to machines and humans, respectively.
      • no specification as to which licenses are allowed, or whether and how they should be version-controlled (though version control for scholarly articles is addressed)
      • no provision for icons that may be suitable for signaling the content of the proposed tags.
      • recommendations for handling embargoes (which differ from the mechanisms proposed by the NISO Workgroup)
      • thoughts on whether and how it could be signaled whether Article Processing Charges have been paid for a given article
    • Comments have been invited until October 21, 2013. The feedback is currently (as of January 2014) being integrated into a revision of the draft.

Existing Open Access icons[edit]

However, these CC BY symbols apply to any materials under that license, so some symbols have been put forward to signal open access more specifically. The one shown here in particular has found broad traction (for instance, it features in almost any materials related to Open Access Week), but amidst considerable confusion of terminology, it has increasingly been used in the sense of free to read (example) rather than the more comprehensive set of use and reuse rights laid out by BOAI and CC BY. Color variants exist (see below). A less popular icon with similar ambiguity is at de:Datei:Open access.svg. As an alternative, has been proposed (a slightly more open variant of which is in use with that very meaning of BOAI compliance here). Another alternative could be a combination of a look icon with an icon depicting some document, as in this slide.

  • PubMed Central use icons to mark journals depositing some or all of their content in the Open Access subset, as explained here and here.


The meaning of the colors can be confusing.

Much of the debates around open access are framed in terms of Green vs. Gold, which boils down to whether the paper is available from the publisher (Gold) or from another place (Green). In current practice, neither of the two guarantees CC BY. To make things worse, a number of resources use completely different color codes - for instance, most of what would be "Gold" in the sense from above is Green in the sense of SHERPA/RoMEO, which also has blue, white and yellow to signal the availability of pre-or postprints, irrespective of licensing.

Some libraries, on the other hand (Regensburg example), use green to indicate to a user of their online catalogs that they have full-text access to the journal, red that they do not have access to the journal, and yellow if full-text access is available for some issues of the journal.

There are color variants of the above-mentioned orange lock icon, e.g. a blue one (usage example), or purple. Inverted colors (e.g. as in this Twitter avatar) could also be an option.

Signalling individual rights differently[edit]

In principle, each and every use and reuse right could be signaled independently by dedicated icons, e.g. the icon pictured here could stand for an item that is free to read. Then, there would be several essential icons that would be needed to signal BOAI compliance (i.e. CC BY) - at least one for reuse (but that might have to be split into the different kinds of reuse listed in the BOAI definition), and perhaps one specifically for reuse by machines (i.e. crawling, text mining). The more icons we use, the higher is the potential for confusion amongst readers.

Existing initiatives around the CC BY literature[edit]

Ways to signal OA-ness on the English Wikipedia[edit]

There are several ways in which OA-ness is currently being signaled on the English Wikipedia:

Practical issues[edit]

  • Ideally, the icons would be clickable and lead to a page that explains their meaning. In normal Wikimedia practice, a click on an image leads to the description page of the image file (which contains metadata about the icon, including provenance and licensing) rather than to a page that explains the meaning of the icon in the context of the click. This latter kind of explanatory page can only be linked from the icon if it is in the public domain (as in Open Access logo PLoS transparent.svg or Closed Access logo alternative.svg), i.e. there is no requirement for attribution. This is not the case for the Creative Commons icons.
  • Bitmap graphics tend not to be rendered properly on mobile devices.
  • It is perhaps best to start deploying the system in a limited number of articles, such as those within the scope of a WikiProject (e.g. Medicine).
  • In the long run, the information underlying the OA-ness signalling would best be served via Wikidata, which would also make it available beyond the English Wikipedia.
  • A system aimed at signaling the non-OA-ness of articles is currently being developed - the OA button. How could the two be combined?
  • In the long run, it would be desirable to signal not just the OA-ness of the cited references, but also of the data and code associated with the references. As there is not much OA-ness to signal there yet, this may be left for later.

Blog posts[edit]

an exploratory study of citations by DOI and DOI prefix (topmost of those that are cited using the {{Cite doi}} template)



A repository has been set up at to host all software related to this project.


The following individuals and organizations will be helpful to moving this project forward. They will be invited to consult and get involved in implementing a pilot program.

  • OCLC
  • Right to Research Coalition
  • Open Knowledge Foundation


The project is supported through a grant from the Open Society Foundations, the proposal for which is available here.

See also[edit]