Wikipedia:Spotting possible copyright violations

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This is a guide to spotting violations of the Wikipedia copyright policy that are simple copy-and-pastes from other websites. Please remember to assume good faith and to avoid copyright paranoia when doing the important work of keeping Wikipedia compliant with CC-BY-SA and, where co-licensed, GFDL.

Signs that an article might be copy-and-pasted[edit]

There are a number of signs that an article might be copy-and-pasted. None of these are conclusive evidence, but more than one of these signs tends to be apparent in a copy-and-pasted article.

Indicative, but by no means conclusive signs:[edit]

  • The text is not wikified or is over-wikified, with every occurrence of a word or phrase made into a wiki link (as if search-and-replace had been used to insert the links)
  • The text was added all at once by one person in finished form with no spelling or other errors.
  • The writing style is "too good to be true"
  • The text has a strange tone of voice, such as an overly informal tone or a very slanted marketing voice with weasel words
  • They may contain non-standard characters such as Microsoft "smart quotes" (Note that these may have been created in Microsoft Word or another word processing software offline)

Strong signs of copy and pasting:[edit]

  • Out of context phrases like "this site/page/book/whitepaper"
  • Isolated or out-of-context words or phrases such as "top", "go to top", "next page", and "click here", that were originally part of the navigation structure of the original website
  • Use of trademark symbols (™,®) and similar typical signs of commercial text
  • A writing style that rarely occurs outside of a specific, invariably copyrighted, use, such as an advertisement or press release
  • A contribution from a user who has a history of violating copyright

Irrefutable evidence:[edit]

  • Pages which exhibit the above characteristics, and include the original site's copyright notice, copied intact!
  • A copy of the page source, including links to other pages on the same server which would not occur on Wikipedia or a wiki (e.g., a link to /home/news/latest.html)
  • A URL, labeled as "reference" or "source", which links to a page on a copyrighted website containing the exact (or almost exact) same text

Checking it out[edit]

Once alerted by one or more of these suspicious signs, you can then check the article by highlighting a sentence or non-trivial sentence fragment that is unlikely to be found by chance in many documents, copying and pasting it into a search engine. You should then check the matching pages, if any, for further correspondence to the submitted article. Be aware that many sites mirror content from Wikipedia, so a search engine may find several sites with the exact content. Those sites should list Wikipedia as the source of the article.

You can also compare text from an article to text in various databases of published information that is not generally available via a search engine:

For extra thoroughness, you may also want to check out the "groups" option in Google, to check that the article is not copied from Usenet.

Many times an image from some other website is uploaded here under the same name. Hence if you suspect an image to be a copyright violation, you can try searching Google Images for the filename of the image to check if there are matches from other websites for the same image. Even if the image was uploaded with a different name, a google image search for relevant search terms might help finding the original image in case of a copyright violation.

If you suspect that a page is a copyright infringement[edit]

If you suspect one, you should at the very least bring up the issue on that page's talk page. Others can then examine the situation and take action if needed. The most helpful piece of information you can provide is a URL or other reference to what you believe may be the source of the text.

  • Remember: please don't bite the newbies -- many copy-and-paste contributors may not understand that what they are doing is wrong, and some may turn into valuable contributors if educated rather than punished. You can use the user's talk page to discuss your concerns with them. The {{nothanks}} template may be useful for this.
  • Some cases will be false alarms. For example, if the contributor was in fact the author of the text that is published elsewhere under different terms, that does not affect their right to post it here under the CC-BY-SA and GFDL. Point them to Wikipedia:Donating copyrighted materials. Material from public domain resources is sometimes republished with unclear or misleading copyright notices which may obscure the origin. An article from another language's Wikipedia might be translated and published here (bringing with it seemingly suspicious anomalies, particularly if the contributor's understanding of English and/or wikification is limited); as long as attribution is supplied to meet licensing requirements, this is not a copyright violation. Also, sometimes you will find text elsewhere on the Web that was copied from Wikipedia. In these cases, it is a good idea to make a note in the talk page to discourage such false alarms in the future.
  • Please see the Wikipedia copyright policy document for what to do in difficult cases, such as where a user continues to post copyrighted material in spite of warnings.

See also[edit]