Wikipedia talk:WikiProject Check Wikipedia

From Wikipedia, the free encyclopedia
Jump to: navigation, search
  Check Wikipedia   WMFLabs   List of Errors   Discussion


  • #46 seems to detect a few internal links inside image description: there was 5 articles on frwiki in the list with that problem. --NicoV (Talk on frwiki) 16:31, 30 August 2013 (UTC)
    Can I see the examples? Bgwhite (talk) 17:13, 30 August 2013 (UTC)
    Forgot to put the link... The 5 articles done in the list: fr:Billet de banque (: au premier plan le [[5000 francs Flameng]]), fr:Lyricon (à vent [[Computone]]), fr:Moteur avec cylindres en ligne (en ligne de la [[Honda CBX|Honda CBX 1000]]), fr:Multi One Design (Environnement (MOD 70)|Veolia Environnement]]), and fr:Pollution sonore ([[circulation automobile]]). --NicoV (Talk on frwiki) 17:20, 30 August 2013 (UTC)
    Also in itwiki. Both when the "true" link is at the bottom it:Ampelio eremita and when it isn't it:Ascari del Cielo. Thanks! --AlessioMela (talk) 21:08, 4 November 2013 (UTC)
    Both cases follow the pattern. There is an image tag at the very beginning. There is a bracket error in the articles, but checkwiki shows the error in the image tag. I know why it is happening in the code, but I haven't found a way around it. Bgwhite (talk) 21:26, 4 November 2013 (UTC)

It seems to be happening again on frwiki (fr:Antihéros, fr:Insulte, ...) but I don't find anything wrong in the articles, even somewhere else. --NicoV (Talk on frwiki) 09:21, 5 April 2014 (UTC)


  • Hmmm, this shouldn't be happening. Looks like it is counting ]]] as having two ]] possibilities. Will look into it. Bgwhite (talk) 20:25, 21 September 2013 (UTC)
    • Matěj Suchánek, interesting case. The problem going on is: [[metr za sekundu|[m/s]]] was followed by a statement with a broken bracket, [ran/[[minuta|min]]. If it wasn't followed by a broken bracket, checkwiki would not say this was an error. Normally I'd say this is a rare case and checkwiki correctly said there is an error on the page, thus this is a real low priority. However, the problem in the code is similar to the problem in the code for #46 error report above this report. So, a solution in one probably fixes the other error. Problem is, I've yet to figure out the #46 error after many hours. Bgwhite (talk) 08:52, 29 October 2013 (UTC)

Error #39[edit]

NicoV Magioladitis After looking at some of the articles in a list of #39 errors not fixed by a bot, I've noticed some "false positives". I use quotation marks because it is actually errors with mediawiki that is causing the problem.

Newlines don't function in <blockquote>, {{quote}}, {{cquote}} and {{quotation}}. I have the checkwiki code skip these for error #39. After looking at the new list of articles, <ref>, [[Image: and {{bq}} also don't work.

<skip several hours>

I have the bug bookmarked and brought it up. Low and behold, the patch that was submitted in December 2011 was finally accepted. Final changes were made today on enwiki. Turns out Visual Editor was assuming newlines worked the same everywhere... silly VE. So, VE started the move to finally fix the problem. Hey, who knew, VE was actually helpful for the first time ever. According to the log, it only took 8 1/2 years to fix.

I've verified that {{quote}}, {{cquote}} and {{quotation}}, <blockquote> and {{bq}} now treat newlines correctly. I've verified that <ref> and [[Image: still barfs on newlines.

I need to add the ref and various image tags to #39's code and remove the currently skipped templates in #39's code. Bgwhite (talk) 05:22, 16 October 2013 (UTC)

Bgwhite this means that now AWB can replace p tags inside blockquote with newlines? -- Magioladitis (talk) 06:12, 16 October 2013 (UTC)
Magioladitis. I'm confused. It doesn't work on Aristole#Geology, but it does work below.



Bgwhite (talk) 06:42, 16 October 2013 (UTC)
Asked a question at User talk:Kaldari#bug 6200 and quote templates. Bgwhite (talk) 06:55, 16 October 2013 (UTC)
Comment was made at bugzilla bug 6200 about the problem. Also bug 55674 for newlines in ref tags. Bgwhite (talk) 09:05, 16 October 2013 (UTC)
6200 marked as fixed. -- Magioladitis (talk) 14:45, 28 October 2013 (UTC)

We can re-enable search inside <blockquote> since bug fixed. -- Magioladitis (talk) 23:49, 15 September 2014 (UTC)


Hi, I know that you're always looking for more work since it's so easy to use Labs ;-)

I'd like to suggest adding some statistics for Check Wiki to give us some information on how errors evolve on each wiki. Would it be possible to add a table with the following informations ?

  • One line for each error
  • Several columns for each day for a month : number of articles detected for the error after the daily scan, number of articles marked as fixed for the error during the day, eventually number of articles marked as fixed during the day but that were detected again

--NicoV (Talk on frwiki) 10:21, 6 November 2013 (UTC)

Great idea. Though I am not sure if Bgwhite has enough time to implement it. --Meno25 (talk) 16:25, 9 November 2013 (UTC)
I know, it's just wishful thinking, no emergency and no problem if it's not implemented. --NicoV (Talk on frwiki) 12:08, 14 November 2013 (UTC)

Include pages in namespace 104 on arwiki[edit]

Please include pages in namespace "ملحق" (NS:104) on Arabic Wikipedia (arwiki) in the lists generated by Checkwiki script. This namespace contains lists and years pages. Pages in that namespace are counted in the number of articles (magic word: {{NUMBEROFARTICLES}}) and AWB's Auto-Tagger already tags articles in that namespace. --Meno25 (talk) 12:11, 23 November 2013 (UTC)

Meno25, I'm going to wait on this for a bit. I've held off on 104, commonswiki and File namespace. I'm using code optimized to run only grab Article namespace from the dump file. Changing out will cause a severe decrease in speed. I'll have to some other changing around to insert the code, but everything else is setup for it. For example, there are if statements that say only Article and 104 namespace can check certain errors. Bgwhite (talk) 08:29, 24 November 2013 (UTC)
@Bgwhite: Thank you for the explanation. Take your time. We are not in a hurry. --Meno25 (talk) 12:21, 24 November 2013 (UTC)

Ideas for new errors[edit]

Time to start thinking about what new errors should be added to Checkwiki.

Ping: Magioladitis, NicoV, Meno25, Crazy1880, LindsayH, GoingBatty, Matěj Suchánek, Josve05a, ChrisGualtieri, Graham87. I think that is everybody. If not, add them to the list.

What should or should not be added will be determined by several factors:

  1. How easy is it to code up?
  2. Is it something that AWB or WPCleaner already can find.
  3. Is it something that AWB, WPCleaner or a bot can currently fix?
  4. Is it an accessibility issue?
  5. Is it a serious issue? Are the errors on the high or medium lists?
  1. High priority: error corrupts or distorts the content posted in Article
  2. Medium priority: improving the encyclopedic content or readability of the article
  3. Low priority: improving maintenance or MOS fixes

Some examples:

  1. Replacing <strike> with <s>. It would take a copy/paste to code up. WPCleaner finds and fixes the problem. It would be Low priority.
  2. Finding cases of url=http://http:// This is a common error I see. It would be High priority. It is fixed by AWB.
  3. Blank lines in bulleted vertical lists. This is an accessibility issue per Wikipedia:Accessibility#Blocked elements. This causes problems for screen readers.
  4. No blank space after the comma in DEFAULTSORT values. An example would be: Bush,George. The article would be sorted first for all surnames beginning with Bush. Currently not fixed by AWB or WPCleaner. Probably medium priority.

Bgwhite (talk) 01:34, 26 November 2013 (UTC)

How about putting the TOC in the standard position in the wiki-markup, which is also an accessibility issue? Not sure about automating this though. Graham87 01:39, 26 November 2013 (UTC)
Seems that there are lots of new citation style errors, some of which appear in red text in the references section. Those might be something worth exploring. GoingBatty (talk) 02:08, 26 November 2013 (UTC)

A few suggestions:

  1. An error to detect non-existent files (red linked files). We have a bot on Arabic Wikipedia that removes such links. However, the bot works on all pages of the Main namespace. Generating a list of pages for the bot to work on would be a good idea. See Wikipedia:Database reports/Articles containing deleted files.
  2. Detecting user signatures in articles (articles containing links to user pages). To be worked on manually. See Wikipedia:Database reports/Articles containing links to the user space.
  3. Detecting fat redirects (redirects obscuring page content). To be checked manually. See Wikipedia:Database reports/Redirects obscuring page content. --Meno25 (talk) 06:43, 26 November 2013 (UTC)

The errors I suggested are covered by the Database reports on English Wikipedia. Database reports are updated regularly only on enwiki, Commons and Meta. Moving the errors to checkwiki means that the reports would get generated for other wikis too. So, maybe disable those errors for enwiki and enable them for other wikis. --Meno25 (talk) 06:43, 26 November 2013 (UTC)

Meno25 you can request similar databases for other wikis. -- Magioladitis (talk) 09:19, 26 November 2013 (UTC)

CHECKWIKI is more about common syntax errors. We need to focus on that. If lists are already generated by other bots/projects we do not need to duplicate the job. Bgwhite's idea of unspaced DEFAULTSORT is a great example of what we are after. WPC's extended list is another good example. I have some minor suggestions:

I don't know if this is an error or maybe already monitored but:

  1. {{cite web}} without access dates.
  2. When only <ref></ref> is used without title/description. This is to prevent link rot.
  3. When two (or more) refs with the same information has diffrent ref-name.
  4. When the time (e.g. 08:45 or 8 am) or the day (e.g. Moday or Saturday) is used inside |accessdate=.

(tJosve05a (c) 11:52, 26 November 2013 (UTC)

Hi, I think new errors should be generic enough to work on most wikis, so avoid very specific errors (for example: {{cite web}} without access dates should be dealt by the template itself: put the page in a maintenance category if access dates are missing). Otherwise, some of WPCleaner errors in the #5xx numbers:

  • #502: Useless "Template" in {{Template:...}} (low)
  • #508: Non-existent templates (medium ?)
  • #511: Internal link written as an external link (medium ?)
  • #512: Interwiki link written as an external link (low ?)
  • #513: Internal link inside an external link (medium ?)
  • #517: <strike>...</strike>
  • #519: <a>...</a>
  • I like some of previous proposals: missing space after a comma in a DEFAULTSORT, doubled http, blank bulleted lined, non-existent files, ...

Some of them are probably hard to develop or require access to a lot more information, so they will be difficult to add (non-existent templates / files, ...) --NicoV (Talk on frwiki) 12:57, 26 November 2013 (UTC)

NicoV I agree with you. My first suggestion is not good neither. I think the best suitable new additions are WPCleaner errors in the #5xx numbers. For non-existent file etc I disagree that we should do something about them. There are databases for those already. -- Magioladitis (talk) 13:39, 26 November 2013 (UTC)
NicoV: #508 is already listed, see Special:WantedTemplates, the files are in Category:Pages with missing files.
I have once suggested a link to a year which has another description ([[2012|2013]], medium or high).
Some inspiration: de:Benutzer:Stefan Kühn/Check Wikipedia#Next features. Matt S. (talk | cont. | cs) 15:16, 26 November 2013 (UTC)

A few more:

  1. More than one blue link per * on a disambig-page. (Per WP:MOSDAB)
  2. Refs and reflist on a disambig-page. (per WP:MOSDAB)
  3. When an article does not have "nbsp" between e.g. 15 km, 2,5 miles and 3 cm)

-(tJosve05a (c) 16:12, 26 November 2013 (UTC)

Moin, like a free space in a category as medium. Example: right: "[[category:xyz]]" and wrong "[[categorie: xyz]]". Stefan Kühn had had a code for the persondata-script. Regards --Crazy1880 (talk) 09:09, 29 November 2013 (UTC)
Crazy1880, Error 22 should be picking those up. Bgwhite (talk) 02:20, 3 December 2013 (UTC)
Bgwhite, oh, yes it did. In the german Wikipedia was the question, if ID 69 will check für "ISBN:", because the linked site only use ISBN. (example: ISBN: 978-3-7657-2781-8 > ISBN 978-3-7657-2781-8) Regards --Crazy1880 (talk) 20:19, 4 January 2014 (UTC)
Crazy1880, #69 checks for ISBN: and ISBN- Bgwhite (talk) 22:21, 4 January 2014 (UTC)

Round 2[edit]

Ping: Magioladitis, NicoV, Meno25, Crazy1880, LindsayH, GoingBatty, Matěj Suchánek, Josve05a, ChrisGualtieri, Graham87.

Following is a list of errors that I think could be added. Some notes:

  • English database reports that Meno25 are not being ported to other languages unless somebody is willing to take on the task. Very few have been ported. So, if a report meets the "standard", I see no reason not to add it to checkwiki.
  • Most citation style errors would be a pain in the butt to code, too many articles that take too long to correct and are not really syntax errors. The one exception that I can think of is missing "url=" when the web address is given.
  • NicoV and Magioladitis, could you WPCleaner or AWB to the appropriate errors and columns.
  • Any errors not in the list that you think should be added? Any other comments?
Description Priority Coding Tools to detect Tools to fix Other
Useless "Template" in {{Template:...}} low Done WPC, AWB WPC, AWB #1 (#502)
Internal link written as an external link medium Done WPC WPC & Frescobot #90 (#511)
Interwiki link written as an external link low Done WPC WPC #91 (#512)
Internal link inside an external link medium WPC (#513) WPC
<strike>...</strike> low Done WPC, AWB WPC, AWB* #42 (#517). Obsolete in HTML5. Use <s>...</s> instead
<a>...</a> low Done WPC WPC #4 (#519)
URL without http:// high Done WPC, AWB WPC, AWB #62
Finding cases of url=http://http:// medium Done WPC, AWB WPC, AWB #93
Blank lines in bulleted vertical lists medium Accessibility issue per Wikipedia:Accessibility#Blocked elements
Putting the TOC in the standard position medium Done WPC #96 and #97. Accessibility issue per MOS Elements of the lead
No blank space after the comma in DEFAULTSORT low Done WPC, AWB WPC, AWB #89
Unbalanced ref tags medium Done WPC, AWB WPC, AWB #94
Detecting user signatures in articles low Done WPC, AWB WPC, AWB #95
Detecting fat redirects (redirects obscuring page content) low
<span class="plainlinks"> in articles low
Pipe in external link [http:/|Wikipedia] low
Link to a year which has another description ([[2012|2013]]) low This error is often caused by VE.
Cases of {{cite web|| title= medium
Move anchor in front title in heading
Detect non-existent files (red linked files)
Detect non-existent templates WPC (#508)
Detect refs <ref name=> low easy often detected as #56
Category with double colon easy AWB
More same parameters in template medium medium
Good :-). I've added the information about what WPCleaner can currently detect and fix (automatic or bot, at least partially). For errors I've already coded with a #5xx number, feel free to use an error number following what CW currently manages or keep the #5xx number. For other errors, I don't see any problem for implementing them in WPCleaner, but it will probably have to wait 2 months, as I will be almost completely unavailable for several weeks. --NicoV (Talk on frwiki) 20:05, 3 December 2013 (UTC)

Errors added[edit]

Magioladitis, NicoV, Meno25, GoingBatty, Matěj Suchánek, Josve05a, ChrisGualtieri

  • #01 - Template with the useless word "template"
  • #04 - HTML text style element <a>
  • #42 - HTML text style element <strike>
  • #62 - URL containing no http://
  • #89 - DEFAULTSORT with no space after the comma
  • #90 - Internal link written as an external link
  • #91 - Interwiki link written as an external link
  • #93 - External link with double http://
  • #94 - Reference tags with no correct match
  • #95 - Editor's signature or link to user space
  • #96 - TOC after first headline
  • #97 - Material between TOC and first headline


  • Only turned on for enwiki for right now. Will start to expand after NicoV's return.
  • Just added #90 and #91. So, there will probably be some problems.
  • For #90 and #91, it will only search for articles written as an external link. Talk pages or special pages will no be searched. History of Wikipedia has examples on why it is done this way.
The description on #91 most be changes to The script found an external link that should be replaced with a interwiki link. An example would be on enwiki [ Wall] should be written as [[:fr:Larry Wall]] so it says in the extrnal link and not -(tJosve05a (c) 21:07, 24 December 2013 (UTC)
And #90 most be changed to -(tJosve05a (c) 21:11, 24 December 2013 (UTC)
Another thing is that it should not say [...]/wiki/Larry Wall]. It should say [...]/wiki/Larry_Wall Larry Wall].(tJosve05a (c) 21:14, 24 December 2013 (UTC)

Errors modified[edit]

  • #22 - Finds more cases of a space in a category
  • #19 - Finds headlines that start with one "=" anywhere in the article instead of only at the start of the article.


Bgwhite I've updated WPCleaner (version 1.31) for the following errors for all wikis: #1 (previously #502), #4 (previously #519), #42 (previously #517), #90 (previously #511), #91 (previously #512). Still have to do: #62, #89, #93, #94. Old #62 and #89 have been disabled. --NicoV (Talk on frwiki) 21:51, 22 January 2014 (UTC)

Thank you Nico. Do you want me to turn on the new errors for frwiki or wait? I'm sure Josve05a will have found a bug before I write this. :). New error #95 will be an editor's signature found in an article. Bgwhite (talk) 22:49, 22 January 2014 (UTC)
Bgwhite, will 95 include UTC, CET, CEST etc.? Since this error might not only be used on enwp? (tJosve05a (c) 22:55, 22 January 2014 (UTC)
(BTW Bgwhite I'm 16 in 5...4...3...2...1...HAPPY BIRTHDAY TO ME!) (tJosve05a (c) 23:00, 22 January 2014 (UTC)
Hey, I already wished you a happy birthday, which you already complained about. Now you want another... pfffft.  :) Time is irrelevant for #95 as I'm only looking for a signature. Bgwhite (talk) 23:09, 22 January 2014 (UTC)
@Bgwhite Yes, I think you can turn the new errors on for frwiki, I'll check what has to be changed in the translation file. --NicoV (Talk on frwiki) 08:47, 23 January 2014 (UTC)
Bgwhite, NicoV, (#91) WPCleaner changes [ Hurley on the [[Internet Movie Database]]] to [[:imdbname:0403424|Hurley on the]][[Internet Movie Database]]]. I see multiple issues with this. It removes the blank space, it leaves 3 bracket at the end (without the WPCleander reporing it. (Found on Colin Hurley). (tJosve05a (c) 10:52, 23 January 2014 (UTC)
@Josve05a: It will be fixed in a next version. It's due to the incorrect syntax of having an internal link inside an external link. It can be reported by WPC if #513 is activated. --NicoV (Talk on frwiki) 19:54, 23 January 2014 (UTC)
@Bgwhite: If possible, start please the new checks for cswiki, too. I will modify the configuration file. Within a week, you can also enable skwiki.
@Josve05a: Happy birthday! You are now same aged as I am (for next 10 months). Matt S. (talk | cont. | cs) 18:28, 23 January 2014 (UTC)

NicoV and Matt S., in theory frwiki and cswiki should start seeing the new errors at the next 0z run.... if the database is up. Today's outage was caused by a disc getting full. Bgwhite (talk) 07:38, 24 January 2014 (UTC)

Hi! It's strange: I've modified the translation file in frwiki 4 days ago, but the old description is still displayed in WMFLabs for #1, #4, #42, #62. No errors are found. --NicoV (Talk on frwiki) 12:15, 27 January 2014 (UTC)
@Bgwhite For frwiki, I've changed the translation file a few days ago: descriptions for new error numbers (#93, ...) have been taken into account on WMF Labs, but not the modified descriptions for old error numbers that have been recycled (#1, #4, #42, #62, #89, #90, #91). Is it a problem to have kept the old descriptions as comments? --NicoV (Talk on frwiki) 09:12, 29 January 2014 (UTC)
NicoV, hmmm, I didn't see the message above this one. Sorry for that. The translation file and every other program has been bombing lately, so that was probably why you didn't see it right away. Between database problems and mounting problems, I'm ready to go screaming into the night. The frwiki dump processing is still running. Which is very amazing that it hasn't died yet.
Do you mean as comments in the translation file as you have done for the French one? I see no problems.
For #96 and #97 I've thrown in a little regex in the English translation file to account for templates being used with a space and no space.
For #95, I only have English "User" and "User talk". I'll get individual wiki's words in a bit. I'll get them thru the API. Bgwhite (talk) 09:31, 29 January 2014 (UTC)
Bgwhite For example, on WMFLabs, #1 is still displayed as "Pas de texte en gras" (the old description, which is commented out in the translation file) when the translation file has been changed 6 days ago; whereas the translations for the new errors (#93 and so on) are correctly used even if they have been changed later (only 2 days ago). --NicoV (Talk on frwiki) 16:47, 29 January 2014 (UTC)
NicoV, looking at the code, it grabs the first variable, commented or not. So, putting the commented lines second does the trick. Bgwhite (talk) 22:32, 29 January 2014 (UTC)
Ok, thanks a lot! --NicoV (Talk on frwiki) 11:08, 30 January 2014 (UTC)

Error #62[edit]

Error #39 (again)[edit]

Hi all. In Demons (novel) the section headed "Characters" employs paragraphs within a bulleted list. This has been coded per the advice given here, but Yobot (and, I think, other AWB-based robots) persists in making "corrections": [2] [3] [4] [5] [6] [7] and so on. Aside from destroying the logical structure of the section, this is also contrary to accessibility guidelines.

I note that the detection of error #39 has already been modified to accept the use of <p>s within certain tags, such as <blockquote>. Can this tolerance be extended to include <p>s within lists?

(I was uncertain whether to raise this concern here, with Yobot, or with AWB. If I've chosen the wrong place, could you please let me know, and I'll try again.) In the meantime, thanks for your collective good work with checkwiki: fighting the good fight, and at scale! — Simon the Likable (talk) 13:49, 10 February 2014 (UTC)

Simon the Likable hi. Thanks for starting the discussion. I was not aware of this problem. Bots tend to revisit a page unless something is changed. -- Magioladitis (talk) 13:54, 10 February 2014 (UTC)
Simon the Likable can you please check if you like my version? -- Magioladitis (talk) 13:57, 10 February 2014 (UTC)
This is both a Checkwiki and AWB issue, so having a discussion at either spot is just fine.
@Graham87: As this is also an accessibility issue, Graham is the one to ask. Current version of Demons (novel) uses * and : to create paragraphs inside lists. This version uses * and standard html paragraph tag. Is the current version acceptable or should the older version be used? Bgwhite (talk) 18:24, 10 February 2014 (UTC)
@Simon the Likable, Magioladitis, Bgwhite: The older version is better, but even there, the gaps between the list items would need to be removed. In the newer version, the HTML lists finish at the end of each paragraph (as can be seen by checking the HTML source). It might be easier to use HTML rather than wiki-markup to create the lists. Graham87 01:13, 11 February 2014 (UTC)
Sorry guys but on my laptop, both versions have the same visual result. I must be semi-blind or something. This happens to me after working on my laptop for several hours. Can someone explain me what are the visual differences? Thanks, Magioladitis (talk) 06:59, 11 February 2014 (UTC)
Visually they are the same. On a screen reader, it breaks up the list. The first item on the list, the one with the <p> tags, with the : it appears as an one item list to a screen reader. Bgwhite (talk) 07:12, 11 February 2014 (UTC)
Thanks Magioladitis. As Bgwhite has outlined, your solution is impeccable visually, but will not allow visually impaired readers good access using a screen reader. I have therefore reverted your change (reinstating the <p>s), but have also taken on board Graham87's point and removed the blank lines between list items. Thus, I think the current version covers both visual and accessibility requirements, and follows recommended coding practices in Help:List#Paragraphs_in_lists and now WP:LISTGAP.
This leaves open my original issue: checkwiki and AWB both regard this recommended markup as an error. Can checks for error #39 be modified to accept the use of <p>s within lists? (Or perhaps there is some other solution?) — Simon the Likable (talk) 13:59, 11 February 2014 (UTC)
Thanks Simon; sounds good here now! Graham87 14:03, 11 February 2014 (UTC)
Hey guys. Any chance that this is a Mediawiki bug and we should report it? -- Magioladitis (talk) 14:06, 11 February 2014 (UTC)
I looked at source code for the latest version and Magioladitis' version. It does not appear to be a bug. In the latest version of Demons (novel), it is one long list made up of <li> tags. If a blank line happens, the list ends. In Magioladitis' version, it starts as a list. When the first : happens, the list is ended. The HTML tags to produce the layout for the : consists of <dl> and <dt> tags. The use of the dl and dt tags is standard HTML practice when text needs varying indentation. The source for this talk page is full of dl and dt tags. Bgwhite (talk) 06:23, 12 February 2014 (UTC)
Yes, both Checkwiki and AWB should not call this an error. Finding a solution is another matter. My brain isn't coming up with an answer. For the time being, I've added the article to a whitelist, so Checkwiki will not find a <p> error in the article. Bgwhite (talk) 06:23, 12 February 2014 (UTC)


It would be very helpful if the check could recognize and ignore

  • ISBNistFormalFalsch=J
Example: de:Erich Burgener - {{Literatur | Autor=Bertrand Zimmermann | Titel=Erich Burgener | Verlag= Editions de la Thèle| Ort=Yverdon-les-Bains | Jahr=1987 | ISBN=2-8283-0024 | ISBNistFormalFalsch=J }}
  • http://xxxxx/isbn/282830024

--Tsor (talk) 09:09, 2 March 2014 (UTC)

Tsor, as usual, I'm confused. Why give a bad ISBN in the first place? I did a Google search and only two non-Wikipedia derived websites give this number and one of them is Wikipedia. Bgwhite (talk) 23:43, 2 March 2014 (UTC)
Hello Bgwhite, this ist just a (bad) example. Sometimes we find in a book an ISBN which is formal wrong. Some guys use the template Vorlage:Literatur where they can mark such invalid ISBNs by "ISBNistFormalFalsch=J". There is another template Vorlage:Falsche ISBN which can mark such invalid ISBNs: {{Falsche ISBN|3-123-45678-9}} leads to "ISBN 3-123-45678-9 (formal falsche ISBN)". This template is used very often:
I will look for a better example for an invalid ISBN. --Tsor (talk) 10:10, 3 March 2014 (UTC)
PS: An additional column in the error-list "marked as invalid" would help. --Tsor (talk) 10:18, 3 March 2014 (UTC)
Tsor, I'm slow, but I still fail to see what is wrong. It would be best to use a correct ISBN? A better example would help me understand. TMg, could you help me out.
There are whitelists in which articles can be added so they won't be raised as an error again. To many things can go wrong with "marked as invalid" button... Already a problem of vandalism by people clicking done when they have no intention of fixing errors. Bgwhite (talk) 3 March 2014 (UTC)
Here are 349 examples. --Tsor (talk) 11:10, 3 March 2014 (UTC)
I just looked at the first one in the list, de:Charles de Melun and I don't understand why the ISBN is qualified as bad: the checksum is correct. Is it normal to have "ISBNistFormalFalsch=J" with an ISBN that seems correct? Edit: idem for second example de:Bussard (Einheit). --NicoV (Talk on frwiki) 12:26, 3 March 2014 (UTC)
Hmm, you are right, in de:Charles de Melun ISBN is marked as bad but ist is ok. Same at your second example. I will have a closer look. --Tsor (talk) 13:26, 3 March 2014 (UTC)
Please repeat your calculation. The checksum digit is false, if the first 9 digits are corect the checksum digit in the end should be a 1, so the ISBN should be 2902091311 and not 2902091312. --Cepheiden (talk) 19:15, 5 March 2014 (UTC)
Well, you're just not looking at the version as was looking at, the page was modified since my comment and changed completely about the ISBN: a ISBN-13 with a coherent checksum was replaced by a ISBN-10 with a non-coherent checksum. --NicoV (Talk on frwiki) 20:21, 5 March 2014 (UTC)
I'm sorry, you are right i didn't notice the edit. --Cepheiden (talk) 17:48, 8 March 2014 (UTC)
I also looked at other, a lot seem in the same situation. There's also cases where the ISBN has indeed a wrong checksum, but the book can be found with the correct ISBN on the internet: de:Mare Imbrium and the corresponding book on google. I've spent quite some time on frwiki to fix ISBN reported by CW (still quite some work to do), but I've found very few situations where the ISBN with the incorrect checksum was confirmed as being the ISBN (it's usually fixed at some point). --NicoV (Talk on frwiki) 15:51, 3 March 2014 (UTC)
Yes, there are cases of ISBN's with false checksum digits used as the original ISBN (printed in book and listed in databases of libraries etc.). If someone cites this book with this ISBN we mark them as "formally false" like some libraries do. So what's the point here? --Cepheiden (talk) 19:15, 5 March 2014 (UTC)
My point was that I was surprised by the size of the list (349 pages), because as I said, I fixed a lot of ISBN on frwiki, and didn't find so much situations where the ISBN with the non-coherent checksum had to be kept. Given that the first hits in the search seemed to be mistakes, I was wondering if it was normal that you have so many page with ISBN tagged as formally false. --NicoV (Talk on frwiki) 20:26, 5 March 2014 (UTC)
This was more a reply to Bgwhite (like Tsor already did). --Cepheiden (talk) 17:48, 8 March 2014 (UTC)

Just an example for the second point: found in de:28 Stories über Aids in Afrika. --Tsor (talk) 22:08, 3 March 2014 (UTC)

It links to "Page not found", the correct link seems to be at (different last 2 digits ISBN). --NicoV (Talk on frwiki) 22:29, 3 March 2014 (UTC)

Adjacent references ?[edit]

Hi, what do you think of adding a detection for adjacent references, like <ref>...</ref><ref>...</ref><ref>...</ref> ? This error probably won't be of any interest for enwiki because reference numbers are put between square brackets [1][2][3]. But on frwiki reference numbers are displayed without any decoration so adjacent references may look like only one reference 123, so we're generally using a template {{,}} between references. --NicoV (Talk on frwiki) 22:14, 27 May 2014 (UTC)

NicoV, could you get me some articles with the problem as test subjects. <maniacal laugh> Test Subjects </maniacal laugh> I take it I need to look for cases of: </ref><ref> and <ref name=ack /><ref ? I also saw your message above about adding to the done pages. Bgwhite (talk) 05:29, 28 May 2014 (UTC)
Ok, will try to find some... The subject was brought on WPCleaner's talk page for this modification, but the page is fixed now. --NicoV (Talk on frwiki) 07:17, 28 May 2014 (UTC)
Bgwhite, I checked a lot of articles but I haven't found an other example yet... --NicoV (Talk on frwiki) 12:16, 28 May 2014 (UTC)
fr:Utilisateur:Zetud/Pb Ref should have a list. --NicoV
Bgwhite, fr:Leetchi, with at least 2 problems in the introduction. --NicoV (Talk on frwiki) 07:34, 2 July 2014 (UTC)

Error #31[edit]

Discussion in User_talk:Frietjes#Infoboxes_to_take_of revealed that most probably Error #31 needs expansion to cover more HTML table tags. -- Magioladitis (talk) 22:45, 31 May 2014 (UTC)

@Frietjes, Magioladitis:. #31 only checks for the case of <table. There are legitimate cases where <td> can be used. Will first check the upcoming June dump file to see the lay of the land for tr and td tags. Bgwhite (talk) 06:47, 1 June 2014 (UTC)
@Frietjes, Magioladitis:, I've added checking for <tr>. I do expect articles to go onto the whitelist. A listing of articles can be found at User:Bgwhite/Sandbox1. Bgwhite (talk) 00:25, 16 September 2014 (UTC)

New error type[edit]

Hello! I'd like to propose to detect a new error type: sometimes there are an in-page interlanguage links written as a regular interlanguage links, i.e. without a starting colon. But they are obviously in-page links since they contain a pipe symbol. For example, this situation was on a page 男同性恋免疫缺乏症 of Chinese Wiki (I don't know such examples in En.Wiki), which contained two such links: [[en:Kaposi's sarcoma|卡波西氏肉瘤]] and [[en:Pneumocystis pneumonia|卡氏肺囊虫肺炎]]. A link part after the pipe symbol is obviously useless for the regular interwikis and this situation is undoubted error. --Emaus (talk) 14:35, 2 June 2014 (UTC)

Emaus @Magioladitis:. In theory, error #31, interwiki before last heading, should catch these situations. Since interwiki use should be minimal now, renaming this error would be a good thing. Maybe "interlanguage link with incorrect syntax"? Bgwhite (talk) 20:12, 2 June 2014 (UTC)
@Bgwhite, Emaus: AWB will react by moving the interwiki at the bottom unless the interwiki matches the project code. -- Magioladitis (talk) 08:05, 3 June 2014 (UTC)

Error #64[edit]

@Bgwhite, NicoV: [[[[foo]]]] is caught as #64 by CHECKWIKI but as #10 by WPCleaner. It is not fixed by AWB. -- Magioladitis (talk) 06:51, 18 June 2014 (UTC)

Hi Magioladitis. What do you think we should do ? I don't see why it's detected as #64 (link equal to link text): do you mean #46 (Square brackets not correct begin)? WPCleaner should detect both #10 and #46. --NicoV (Talk on frwiki) 13:05, 20 June 2014 (UTC)

OK. I am getting rusty. Sorry again. This one show that AWB did not fix 64. but this is maybe due to the order of how stuff is done. Same here. -- Magioladitis (talk) 13:14, 20 June 2014 (UTC)

Ok, I understand better, especially with the next modification. Maybe internal link is not correctly recognized by AWB due to the extra brackets? WPCleaner edit seems fine (#10, #46 and #64), except for the automatic comment ("null"...), I have to fix this one. NicoV (Talk on frwiki) 13:28, 20 June 2014 (UTC)

Whitelists not always exclude things[edit]

@Bgwhite: After the last dump I realised that the whitelist for #48 never works. Same for the #101 whitelist. -- Magioladitis (talk) 08:09, 18 June 2014 (UTC)

These two were fixed. -- Magioladitis (talk) 09:59, 21 September 2014 (UTC)

@Bgwhite: Error 24 whitelist does not work. -- Magioladitis (talk) 08:46, 21 September 2014 (UTC)

I may have fixed it with this edit. -- Magioladitis (talk) 08:49, 21 September 2014 (UTC)

@Bgwhite: Error 31 and 49 whitelists do not work. -- Magioladitis (talk) 09:59, 21 September 2014 (UTC)

Magioladitis, #49 had the same problem as #24 and I fixed it a few weeks back. #31 and #49 haven't been updated on my computer. I was a little blindsided by the timing of this month's dump and didn't do any updates before hand. Bgwhite (talk) 22:44, 21 September 2014 (UTC)

Error #48[edit]

Yes check.svg Done

We should exclude anything inside timeline tags. -- Magioladitis (talk) 07:10, 19 June 2014 (UTC)

Error #101[edit]

Yes check.svg Done

We should exclude search inside {{Not a typo}}. -- Magioladitis (talk) 07:49, 20 June 2014 (UTC)

Error 3 on elwiki[edit]

I think WPCleaner catches the list found at el:Βικιπαίδεια:WikiProject_Check_Wikipedia/Μετάφραση while CHECKWIKI script does not. -- Magioladitis (talk) 16:48, 27 June 2014 (UTC)

I think the problem is only with the last line. Now that I updated the code, I noticed that all errors shown are connected to the last line. -- Magioladitis (talk) 05:26, 8 July 2014 (UTC)

False positives for #87[edit]

Hi, with the latest full dump, there seems to be a lot of false positives for #87 (HTML entities without ;). Examples from the 25 first pages reported:

--NicoV (Talk on frwiki) 20:54, 21 July 2014 (UTC)

NicoV We turned off #87 on enwiki because of the false positives. I'm not sure how to fix this. The hard part is there can be letters or numbers after an entity. Any ideas? Bgwhite (talk) 22:32, 21 July 2014 (UTC)
Bgwhite Apart from the last 2, I think the only thing that could be done is filtering out the errors when they are found in special places (URL, attribute of a tag, timeline, image, ...). For the last one, I only see doing a case sensitive compariso. And for the &phis;, I don't know... Not very helpful, sorry. --NicoV (Talk on frwiki) 06:00, 22 July 2014 (UTC)

Analysis of an article[edit]

Hi @Bgwhite:, I was wondering if we could enhance the integration between Check Wiki and tools like WPCleaner, by providing access to the direct analysis of an article in Check Wiki: I'd like to be able to send a request to Check Wiki script checkwiki_bots.cgi (with the following parameters: wiki, article title, article text) and receive an answer telling me which errors are still detected and where (character position ?). I don't know how much work that would be on your side, but that could be very helpful to users when WPCleaner doesn't detect the problem CW detected: we would know if CW thinks that the problem is still present and where, so I could tell the user where it is on their current version of the article. --NicoV (Talk on frwiki) 20:01, 10 August 2014 (UTC)

New error : empty titles ?[edit]

Hi, I was thinking about a new error for detecting empty titles, like the ones VE is creating on a regular basis (== <nowiki /> ==). --NicoV (Talk on frwiki) 18:10, 13 August 2014 (UTC)

NicoV, I did a scan for enwiki and came up with 83 articles. The VE edits all appear old. I wonder if they have fixed the problem in new VE builds? Bgwhite (talk) 22:31, 22 August 2014 (UTC)
Bgwhite, apparently it's still not fixed, the last VE edit I found with this problem is from last night. --NicoV (Talk on frwiki) 09:10, 23 August 2014 (UTC)
Thanks for the list Bgwhite, I've added error #522 to detect empty titles and fixed all the occurrences. --NicoV (Talk on frwiki) 12:21, 24 August 2014 (UTC)

And also, of the same kind, a new error for empty internal links, like in this edit ([[Boom Fm|<nowiki/>]] and [[Roger Blackburn|<nowiki/>]]). --NicoV (Talk on frwiki) 10:11, 23 August 2014 (UTC)

About software[edit]

I can see that this wikiproject uses scripts and tools to assist work of the participants. I have a feeling that (usually) routinely done tasks are to be done server-side instead. What wiki software features would ease this work? Gryllida (talk) 04:13, 17 September 2014 (UTC)

Gryllida I don't understand your question, but that isn't unusual for me. Every fix is done by a person or bot. Bots can't do everything. Both AWB and WPCleaner can be used manually or in bot mode. There is a listing of what tool can or cannot fix. WPCleaner is written in Java, AWB is written in .Net, and Auto-Formatter is javascript. Bgwhite (talk) 04:45, 17 September 2014 (UTC)

Comment on the WikiProject X proposal[edit]

Hello there! As you may already know, most WikiProjects here on Wikipedia struggle to stay active after they've been founded. I believe there is a lot of potential for WikiProjects to facilitate collaboration across subject areas, so I have submitted a grant proposal with the Wikimedia Foundation for the "WikiProject X" project. WikiProject X will study what makes WikiProjects succeed in retaining editors and then design a prototype WikiProject system that will recruit contributors to WikiProjects and help them run effectively. Please review the proposal here and leave feedback. If you have any questions, you can ask on the proposal page or leave a message on my talk page. Thank you for your time! (Also, sorry about the posting mistake earlier. If someone already moved my message to the talk page, feel free to remove this posting.) Harej (talk) 22:47, 1 October 2014 (UTC)

Unclosed center tags (error 102?)[edit]

Maybe it's time to add unclosed center tags as error #102? Errors 28 and 39 reduced and we need a need game to play with. -- Magioladitis (talk) 08:24, 3 October 2014 (UTC)

Error number 48 title linked in text[edit]

I saw a bot correction of a citation I posted the other day, and the edit summary referred me here to the description of error number 48, title linked in text. But the cite template documentation says that the title of a source can be wikilinked to an existing Wikipedia article, as I attempted to do. Did I throw the error with my citation because the span of text wikilinked was no letter-for-letter identical with the title of the book in the template title field? If so, I can fix the problem by setting up a redirect to the article. The citation I put in new articles the other day is shown here (the raw mark-up of this question in edit mode will show exactly how I coded the template).

Flynn, James R. (2009). What Is Intelligence?: Beyond the Flynn Effect (expanded paperback ed.). Cambridge: Cambridge University Press. ISBN 978-0-521-74147-7. Lay summary (6 October 2014). 

Thanks for any advice you have about this. -- WeijiBaikeBianji (talk, how I edit) 18:06, 8 October 2014 (UTC)

WeijiBaikeBianji, #48 does have anything to do with citation. That was the primary reason the bot arrived at the article. Depending on what bot did the edit, the summary may have contained something like, "Do general fixes and cleanup if needed". The citation edit probably would fall under that. However, it would help if you could give the edit in question. I could give a better answer if I could see what happened. Bgwhite (talk) 22:45, 8 October 2014 (UTC)
Oh, I see, The edit[8] was in the article that is about the book, and thus the removal of the Wikilink from the citation template had nothing to do with the format of the template's fields, but everything do with where the template was inserted. (That means, I guess, that I can still wikilink the book title when I cite the book in other articles on Wikipedia.) Thanks for your reply. -- WeijiBaikeBianji (talk, how I edit) 23:17, 8 October 2014 (UTC)
WeijiBaikeBianji, I think you got it and yes, you can still wikilink the book title in other articles. Bgwhite (talk) 00:59, 9 October 2014 (UTC)

No errors at plwiki[edit]

For the last few days Check Wikipedia reports no errors at all at the Polish Wikipedia. Please have a look. ToSter (talk) 12:47, 16 October 2014 (UTC)

ToSter, probably somebody went thru and marked all the bugs fixed. Happens on enwiki too. A new dump is available every two or so weeks and the errors will get repopulated then.
The bigger problem is the latest plwiki dump came out two days ago and a new checkwiki run wasn't done. Looking around, I found the dump files were not being updated at WMFLabs, again. I've filed a bug report at WMFLabs to have them fix this. Ironically, I got an email this morning saying my last bug report for the same thing was finally closed after a month. I wouldn't have caught this for another week or two, so thanks to you, it will get fixed sooner. Thank you. Bgwhite (talk) 21:52, 16 October 2014 (UTC)
Bgwhite, thanks for the explanation. Good to know that I inspired you to find the error. As for the disappearing errors, wouldn't it be better to scan all the pages which have been lately marked as done too? If all pages which are not marked as done are scanned regularly, that cannot have a great impact on performance. It's simple to click "done" accidentally. And even if it's done on purpose, the script should check it on its own. In my opinion, the distinction between "done" and "not done" should be used solely for the purpose of editors who are fixing the errors - sometimes concurrently. ToSter (talk) 19:16, 21 October 2014 (UTC)
ToSter, a couple of reasons not to do it. 1) New lists are generated every ~15 days (whenever a dump is available). This is a relatively short amount of time. 2) There's only an occasional problem of people blanking errors. 3) A majority of errors are fixed via WPCleaner. It automatically marks done if the error was fixed, so not much of a problem of accidentally hitting done. Bgwhite (talk) 22:40, 21 October 2014 (UTC)
Bgwhite, until now we haven't used WPCleaner at plwiki and I sometimes use pywikipediabot. Could you please describe what checkwiki does exactly on daily basis? I cannot find this documented. ToSter (talk) 07:10, 22 October 2014 (UTC)
ToSter, see Wikipedia:WikiProject Check Wikipedia#Operation
The main programs used for manual and bot fixing are WPCleaner and AWB. There are also some pywikipediabots. WPCleaner does have a Polish translation. Not sure about AWB, but Magioladitis would know. To see what errors these tools fixes, look at the List of errors. I know bots have been approved on multiple Wikipedia's.
I saw you edited the "Polish Translation" file. Both Checkwiki and WPCleaner use the same file. Anyone can change what errors Checkwiki will and will not look for, also change priority settings. Feel free to change the file. One can add a whitelist and a "template" listing to the translation file (see the English file as an example). The "template" listing can be a listing of whitelisted templates (see #59's listing) or adds templates to check (#61's add templates to check for punctuation after the template). A common template listing to add is for #78 as different language projects have their own reference templates. Bgwhite (talk) 07:54, 22 October 2014 (UTC)
Yes, I have edited the Polish translation file but it seems to have no impact on checkwiki - the labels are still in English or even blank. Bgwhite, could you please have a look? ToSter (talk) 11:11, 24 October 2014 (UTC)
ToSter, I removed the depreciated parameters from the template file. The descriptions that were in English are now in Polish. I did notice one problem, the whitelists. The whitelist parameter should point to a file. There can be alot of articles on the whitelist, so a file is easier to maintain. Look at the English translation page to see the syntax and also view a English whitelist file to see its syntax.
And yes, I have read the "Operation" section but the point "For a few Wikipedias, the program scans newly revised articles on a daily basis to create a new list for users, omitting already-corrected articles." doesn't say much. Which Wikipedias are these and what does "newly revised" mean? ToSter (talk) 11:13, 24 October 2014 (UTC)
There are five wikipedias, English, French, German, Spanish, Arabic and Czech, that are updated daily. The first three because they are the largest Wikipedias, the last two because they were requested. Every ten minutes, checkwiki grabs the last 500 articles that were edited. At 0z everyday, these articles are checked for problems. In the case of Arabic or Czech, that is probably every article that was edited that day. For the others, because of such high volume of editing, not every edited article will be checked.

Code used for generating the lists[edit]

Is the code (or list of regular expressions) available? I believe I could suggest some improvements for cutting down on false positives and/or the number of whitelisted articles for some of the lists. Frietjes (talk) 15:35, 17 October 2014 (UTC)

Frietjes check here. -- Magioladitis (talk) 16:21, 17 October 2014 (UTC)
thank you. my first improvement would be to add the following on line 1132 of
$test_text =~ s/\{\{\{\|safesubst:\}\}\}//g;
this would fix all the false positives from RFD discussion tags in list 28 (i.e. remove these), unless that's already been fixed? Frietjes (talk) 16:31, 17 October 2014 (UTC)
my second improvement would be to change '<tr' to '<tr[^a-z]' in error_031_html_table_elements which would avoid matching '<transcript>' and other non-table tags that start with tr. Frietjes (talk) 16:34, 17 October 2014 (UTC)
Frietjes, Magioladitis both changes implemented. The sufesubst change was also added to errors #34 and #43 as it showed up there too. Bgwhite (talk) 20:35, 17 October 2014 (UTC)
It looks like the fix was to ignore all pages with {{{|safesubst:}}}, which is suboptimal :( I suppose the better thing would be to fix Module:RfD, but it seems as though there was a logical reason for adding it there. not sure if there is any other solution, but we shall see. it would be a shame to have to resort to such hacks since, technically, {{{|safesubst:}}} is a programming element. Frietjes (talk) 21:08, 17 October 2014 (UTC)
Bgwhite can we undo the 'safesubst:' hack? this change was just made, so in a few weeks, we shouldn't have any of these left. the fix for the tr tags is great though since it means we won't have to hack around Additive Manufacturing File Format, Event Programming Language, GPS Exchange Format, ... Frietjes (talk) 22:11, 17 October 2014 (UTC)

#14 false positives[edit]

Two false positives at plwiki are reported. To remove such cases, you might check only for "<source ", not "<source", and skip code which is in a <source> by itself. ToSter (talk) 19:20, 21 October 2014 (UTC)

The solution is again "<source[^a-z]". -- Magioladitis (talk) 06:13, 22 October 2014 (UTC)

Magioladitis, Error #14 doesn't use a regex. It uses the same subroutine used for checking imbalanced nowiki, pre, comment, syntaxhighlight, code, math, hiero, and score. The regex also doesn't solve the problem with the articles ToSter mentioned. The problem with the articles... there are valid, unbalanced source tags inside source tags.
Following scenario is in ToSter's articles, where the second source is not an html source tag.
<source> [text] <source> [text] </source>
Problem is... how does one differentiate between ToSter's scenario and a scenario where the first <source> tag is actually missing a closing tag, especially when editors don't always put extra parameters inside source tags. Bgwhite (talk) 07:26, 22 October 2014 (UTC)
We also have false positives on frwiki, which doesn't seem to fall into the above category:
  • fr:Apache Ant: a <sourcePath> tag is detected as being a <source> tag
  • fr:Vidéo HTML5: there are 3 self-closing <source /> tags inside a <syntaxhighlight> tag. The third one is reported.
--NicoV (Talk on frwiki) 13:41, 25 October 2014 (UTC)