Wikipedia:WikiProject Punctuation

Project Punctuation is a project to fix missing punctuation in Wikipedia articles.


This project exists to correct common typographical and grammatical errors in Wikipedia. The errors are discovered automatically by software crawling offline dumps. Because these errors are difficult to recognize and correct automatically, potential errors are collected together into lists that are processed manually by volunteers. Fortunately, this task is very easy for humans.

Project status[edit]

There is only one analysis right now, which detects the lack of punctuation at the end of a paragraph of text.

In July 2005, we completed the first round, checking and correcting every article in the English Wikipedia.

In order to catch anything that slipped through the cracks, and to fix newly-introduced mistakes, we started a second round on 16 September 2005. The second round was completed on 4 May 2006.

How to help[edit]

The output of the analysis is sent to a series of dump files, which, because of their size, are not stored on Wikipedia. To help, choose one of the dump files from the list below. Go through the entire dump file, and fix all of the articles that need help. When you're done, edit this page to remove it from the list, so that nobody tries to duplicate that work.

In order to help people find this project, consider using an edit summary like the following: Missing period(s) ([[Wikipedia:WikiProject Punctuation|You can help!]])

Fixing articles[edit]

The dump files appear as a series of article titles (which are links to their Wikipedia pages), each followed by a list of paragraphs that may have problems. These are found by a computer program, so not every article or paragraph that appears needs to be fixed. Also, because the dumps are based on the last downloadable version of Wikipedia (currently 9 September 2005), someone may have fixed the mistakes independently, or the article may not even exist any more. However, the rate of actual errors is very high, so there is lots to do.

Here's what to fix:

  • If there is a paragraph of English text that does not end in punctuation, add it. Exceptions:
  • Items in a list. Most are filtered out, but some are formatted using paragraphs and markup; these are usually easy to spot because many list items appear in the dump
  • Paragraphs that end with See also: [link] — it is standard Wikipedia practice to omit a period for "see also", "main article" and similar (many are filtered out)
  • Paragraphs that end with parenthetical citations or links (but the previous sentence should end with punctuation!)
  • Quotations, with an attribution
Example: "Brevity is the soul of wit." — William Shakespeare
  • Paragraphs that end in a parenthetical remark, with internal punctuation. This is bad style, but we are working to fix incontrovertible mistakes only (usually filtered out automatically)
Example: The king commanded him to leave (but he didn't.)
  • Paragraphs that end in an abbreviation or word that contains a period. This is also bad style, but not incontrovertibly a mistake
Example: Many English Wikipedia editors come from the [[U.S.]]
Example: He became an [[Associated Press]] [[reporter]] based in [[Washington, D.C.]]
  • However, you should fix errors of including the period within the [[ ]] links, if it does not belong
Example: Links to articles that don't exist will be [[red.]]
Example: The link to an article should not include a [[punctuation|period.]]
  • Sentences that are not ending, because they are followed by a list, equation, table, or image (many of these are filtered out)
  • Sentences that are incomplete. If it's not obvious how to repair them, you can leave a message on the Talk page asking for help
Example: In 1926, the president declared
  • Missing punctuation on text that is otherwise badly broken. If a page is marked for cleanup, you might just leave it for the person that ultimately cleans it up. If it's not marked, you should consider marking it with {{cleanup}}
Example: nascar is totaly sweet & its generally agreed the cars are very very fast
  • Sometimes you will also catch stray characters that have been inserted, or broken Wiki markup. You should fix these too.

Generally, if you're not sure it's a mistake, or covered in the scope of this project, don't fix it. There are plenty of real mistakes to tackle first.

Dump files to process[edit]

There were 415 dump files in the second round, numbered sequentially. Each dump file contains approximately 50 identified articles, about a third of which will be real errors. You can edit this page to indicate to others that you are working through a dump file (by adding to a row: -- ~~~~ '''working'''). Don't forget to remove the link here once you process it!

There is no active round, so there are no dumps to process. We will begin a new round in a few months.

Other ways to help[edit]

Missing punctuation continues to be a problem in new articles, so I will continue to run periodbot occasionally. Suggestions for patterns to automatically filter out are appreciated, since we don't currently do anything to avoid seeing false positives again and again.

I am also interested in other analyses to run. Are there any other common patterns of mistakes in Wikipedia articles? Post your suggestions on the talk page!


Feel free to add your name (use ~~~) to this list if you have helped process the dump files. Thank you, everyone, for making Wikipedia a better place!

21:11, 31 August 2007 (UTC)

