Wikipedia talk:WikiProject Check Wikipedia/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Archive 1 | Archive 2

Contents

Small style element[edit]

What are possibilities or recommendations how to replace style element small Wikipedia:WikiProject Check Wikipedia#HTML_text_style_element_.3Csmall.3E? Especially I would like to ask if there is easy solution for Wikipedia:Taxobox usage#Synonyms? (Personally I do not prefer writing authority in small letters at all.) --Snek01 (talk) 12:46, 19 December 2008 (UTC)

I'm not sure I understand the objection to the usage of the small tags. Is it just that the html is superfluous or difficult to edit for noobs? The reason for the usage, often, is that authorities can sometimes be quite lengthy and instead of wrapping the text in the taxobox, nbsp and the small tags are used to keep everything on one line. See Utricularia stellaris for example. I've used this html tag throughout my edits in the Utricularia species. I also find that it separates the authority from the species visually; many people aren't used to seeing authorities, so with a link and a different text style, it gives them an idea that it means something different from a species or variety name. I'd be open to suggested changes that would maintain some sort of font difference, though. --Rkitko (talk) 13:11, 19 December 2008 (UTC)
Yes, there still can be authority written in such smaller letters (until it is recomended in documentation of Taxobox). Authorities can always be easily distinguish, because the scientific name is in italics. And to its length: there is nor reason to change contents for better view but the view should accomodate co its contents. (Soory for my English, I hope I used appropriate words.) --Snek01 (talk) 17:07, 19 December 2008 (UTC)

This project WikiProject Check Wikipedia recommend to not use the style element <small>. So the question is if we really should replace style element <small> with something other (which can have the same appereance) and what are possibilities? Maybe there is no reason to change it or maybe a better possibility does not exist. I do not know. --Snek01 (talk) 17:07, 19 December 2008 (UTC)

<span style="font-size:80%">make me small</span> maybe? --213.168.121.43 (talk) 02:45, 21 December 2008 (UTC)
Normally, we don´t need <small>. In Wikipedia we only write text. The Stylesheet (css) make with this text the right output. I hope we can eliminate all <small>-Tags from the text and put all formats in the stylesheet. In XHTML is small not allowed, there you can only use <span style="font-size:80%">make me small</span>. -- sk (talk) 16:04, 21 December 2008 (UTC)
Hello Stefan. In XHTML a <small/> element is allowed. Cascading Style Sheets (CSS) are used only to describe the presentation (the look and formatting) of a document. They do not describe the "meaning" of elements (that is, non-semantic markup). If you replace i.e. <small>testo</small> with <span style="font-size:90%;">testo</span>, web-based screen readers will fail to interpret a correct meaning of a element content. Also you can use CSS to describe a look of <small/> element (instead of <span/> element). --DaBler (talk) 15:10, 28 February 2009 (UTC)
The "Small" element if completely valid in some HTML/XHTML; however, WikText should be used on Wikipedia whenever possible... after all, we use "==" instead of "<h2></h2>", so we should also use {{small}} to make text smaller. -Drilnoth (talk) 16:37, 28 February 2009 (UTC)
Hello Drilnoth. A <small> element is valid in all versions of HTML 4 ([1]), XHTML 1.0 ([2]), XHTML 1.1 (Presentation Module, [3]). Wikipedia (MediaWiki) declare XHTML 1.0 Transitional document type. However, the syntax == cause a creation of a <h2> element (it is semantic correct markup). Template {{small}} generate a <span> element, not a <small>. <span> element is generic language/style container. Using a <span> element instead of <small> is valid, however semantic interpretation is incorrect. --DaBler (talk) 13:01, 2 March 2009 (UTC)
I was under the impression that, generally, WikiSyntax (including templates which simply add CSS style ids and classes to the page) was preferred over actual HTML and XHTML. However, I don't really care one way or the other (I, personally, have started using {{small}} because it's quicker to type than both the start and end tags), and I think that you should probably talk to the creator of the script if you think it should be removed. I don't really worry about changing that in articles unless I'm editing around it anyway, but Stefan might have some reasons that I have either overlooked or forgotten. -Drilnoth (talk) 03:18, 3 March 2009 (UTC)
There is template allready available at Template:Small. --Snek01 (talk) 22:19, 1 January 2009 (UTC)
not correct correct
<b>testo</b> '''testo'''
<i>testo</i> ''testo''
<u>testo</u> <span style="text-decoration:underline;">testo</span>
<s>testo</s> <span style="text-decoration:line-through;">testo</span>
<strike>testo</strike> <span style="text-decoration:line-through;">testo</span>
<tt>testo</tt> <span style="font-family:monospace;">testo</span>
<big>testo</big> <span style="font-size:120%;">testo</span>
<b><big>testo</big></b> <span style="font-size:larger; font-weight:bold;">testo</span>
<small>testo</small> <span style="font-size:90%;">testo</span>
<p> <p></p>
<br> <br /> or <br></br>
<center>testo</center> <p style="text-align:center;">testo</p>
<p align="center">testo</p> <p style="text-align:center;">testo</p>
<font color="#224466">testo</font> <span style="color:#224466;">testo</span>
<font style="text-decoration:overline">testo</font> <span style="text-decoration:overline;">testo</span>

Wow[edit]

This page is awesome. Great work. :-) --MZMcBride (talk) 05:31, 24 December 2008 (UTC)

Thanks. -- sk (talk) 18:05, 1 January 2009 (UTC)

Parenthesis and punctuation spacing errors, if possible.[edit]

I often see errors of a missing space before or after a parenthesis, or after a comma. E.g., "Smith traveled from New York,where he had studied, to North Carolina(accounts of the dates vary)". Can we get a list of those? bd2412 T 20:52, 1 January 2009 (UTC)

I will test this. Thanks for this info. -- sk (talk) 17:10, 2 January 2009 (UTC)
This is not easy: "Antimon(III,V)-oxid", "(Anti-)Atomkraft" and so one. I have try it with different regular expression, but I have no good results. -- sk (talk) 21:16, 3 January 2009 (UTC)

Duration[edit]

Duration: 255 minutes 19 secounds

Wait a minute! Secounds? Simply south not SS, sorry 17:06, 5 January 2009 (UTC)

Table not correct end[edit]

Great work!

{{End}} and its redirects are used in some cases of incorrect table ends... Therefore are semi false positives... For laziness im just doing a list compare between the current 200 and what transcludes that (and {{End box}} which seems to be the most common)... Reedy 23:37, 7 January 2009 (UTC)

I insert "end" "end box" and "End box". Is there more? -- sk (talk) 20:44, 4 February 2009 (UTC)

Article with <ref> and no <references/>[edit]

I think there are many false positives for this as some redirect templates are missed on en: {{ref-list}}, {{reflink}} are a couple. Thanks Rjwilmsi 21:11, 14 January 2009 (UTC)

No more ref-list.. Reedy 23:31, 31 January 2009 (UTC)
Green tickY, I insert "reflink" in my skript. -- sk (talk) 20:40, 4 February 2009 (UTC)
[4] would be them all Reedy 20:43, 5 February 2009 (UTC)

Table Formatting[edit]

A few of the tables have 2 columns, where the 2nd is empty. Would be nice if you could tidy that up

Reedy 22:16, 14 January 2009 (UTC)

People editing the page...[edit]

Is it me, or is it just a bit pointless? You're creating nearly 0.5 megabyte revisions to remove a few lines, which, would be remove anyway at a later update?

Reedy 20:56, 20 January 2009 (UTC)

Good point, I've been editing just to avoid overlap. Any suggestion how to get round that without new revs?Cubathy (talk) 17:01, 17 February 2009 (UTC)
I think that it's kind of silly to remove five or six lines at a time, but if you finish a section or need to take a break and want to mark off the twenty+ that are done, that makes sense to me. -Drilnoth (talk) 02:06, 21 February 2009 (UTC)

Double Categories[edit]

What is the policy on double categories like Alpha Centauri where two items in the category (two HD objects/HIP objects in this case) point to the same page (because it's a binary star system). Seems to make some sense the way it is, but should we be creating two new pages for this? —Preceding unsigned comment added by Cubathy (talkcontribs) 09:51, 17 February 2009 (UTC)


Mismatched square bracket[edit]

Just went through the list for error 10. There were a few articles in the table which did not appear to have issues. For example:

There were about 10 in total. Seems like the script has difficulty with more complex structure (images with double bracket links in the description) and also ignores the links which Wikipedia doesn't process (i.e. the code blocks on the REBOL page). Any chance of getting these issues looked at? Cubathy (talk) 14:55, 18 February 2009 (UTC)

A similar error for the table without end tag error:
In REBOL it is code, then you should use the tag <source> or <code>. -- sk (talk) 21:16, 18 February 2009 (UTC)
In "Black Site" it is the complex image description. But if this only in this article, then it should be change there. Normaly a image and the description stand only in one line. -- sk (talk) 21:20, 18 February 2009 (UTC)
Also in "Wikipedia" it is a complex description with references. Maybe I can fix this in future. -- sk (talk) 21:21, 18 February 2009 (UTC)
In "Borel functional calculus" my script found "{|". This is in Wikipedia the begin of a table. If you want write it in this article then use the Math-Tag. -- sk (talk) 21:25, 18 February 2009 (UTC)

automated mistakes?[edit]

Please check edits like http://en.wikipedia.org/w/index.php?title=2008_South_Ossetia_war&diff=next&oldid=272623918. Instead of correcting something, it removes valid and needed brackets. No idea how this project works, but something is going wrong here. --Xeeron (talk) 16:04, 23 February 2009 (UTC)

This is related to the square bracket errors in the section above. The article contains 'complicated' bracketing which the script reports as an error (but fix is not automated so this one shouldn't have been updated). Cubathy (talk) 16:32, 23 February 2009 (UTC)
Ok, no big deal, but please be more careful when checking (not directed at you Cubathy), at pages with less traffic, such edits might go unreverted for a while. --Xeeron (talk) 16:50, 23 February 2009 (UTC)
Oops! My bad on that one. I guess I just got messed up because there aren't typically references within wikilinks, and I just didn't notice that it was part of an image. I'll be on the lookout for that in the future. Thanks for mentioning it! -Drilnoth (talk) 00:22, 24 February 2009 (UTC)

Gallery Without Description[edit]

This is one of the projects, but mw:Help:Images#Gallery of images states that captions are all optional. Why is this project present? Yellowweasel (talk) 00:12, 26 February 2009 (UTC)

I think that the captions are optional, but are highly recommended... it just doesn't specifically say that. Tools such as screen readers need the captions to "read" the images, so it's really an accessibility issue. -Drilnoth (talk) 00:21, 26 February 2009 (UTC)
The problem is that in many cases, captions for each image would be redundant, such as for American Beaver. Yellowweasel (talk) 00:53, 26 February 2009 (UTC)
Ah... good point. In that case I'd recommend posting something at de:Benutzer Diskussion:Stefan Kühn/Check Wikipedia, so that the script designer can take a look. -Drilnoth (talk) 00:55, 26 February 2009 (UTC)
If the description redundant then is also the image redundant. I think than we can put this images at commons and not in the article. In American Beaver this images are redundant:
They have no new information about this animal. I think it is a good idea to descripe realy every other images in this gallery. -- sk (talk) 16:52, 26 February 2009 (UTC)
Good point... when images are that redundant, you generally only need one. -Drilnoth (talk) 16:55, 26 February 2009 (UTC)

Arrows... Its showing an up arrow when the numbers have stayed the same...[edit]

As above Reedy 10:22, 8 March 2009 (UTC)

No it is normal. Yesterday I have play with enwiki and crash the statistic. After this I fix the list with articles and my script find today many errors again. -- sk (talk) 17:24, 8 March 2009 (UTC)

Possible AWB Plugin[edit]

de:Benutzer_Diskussion:Stefan_Kühn#Re:WikiProject_Check_Wikipedia.E2.80.8E

Asked if there is some way we can get an XML format or similar for AWB to use.. As AWB can easily be used to fix numerous of the errors.. And possibly more automated Reedy 15:25, 8 March 2009 (UTC)

See here. I write you an email. -- sk (talk) 17:04, 8 March 2009 (UTC)

Translation page[edit]

Hi, now I have insert the translation page in english. So you can write a better description or activate and deactivate the errors by yourself. -- sk (talk) 19:06, 16 March 2009 (UTC)

Awesome! Thanks. –Drilnoth (TC) 12:41, 17 March 2009 (UTC)

Reformatting[edit]

Okay, so for the past few days there's been a Wikimedia Foundation error whenever I've tried to update the list with the content from the toolserver... apparently because of page size. Therefore, I have split the page into three subpages, with each one transcluded onto the main page, thereby fixing the length issue. Use of the project shouldn't change, but it will take longer to update each time... in fact, I'd reccommend just doing it every other day to save the effort. Here's a breakdown of the three pages:

I will continue to try to update this every other day, and will occasionally check to see if the updates can be done normally again. –Drilnoth (TC) 18:03, 20 March 2009 (UTC)

Maybe it is better to set the number of article per error from 100 to 25 or so. I can change this in my script an you have not so much problems. -- sk (talk) 18:06, 20 March 2009 (UTC)
That could work for some of the problems... I think that 25 would be good for all of them other than "Headlines start with three "="", which I try to go through whenever I have the time and I'd probably pass 25 pretty quickly each day, so maybe keep that at 100. Sorry about being WP:BOLD and just doing this... it was just starting to get a little frustrating. However, if you can make that change, it would probably work. Thanks! –Drilnoth (TC) 18:16, 20 March 2009 (UTC)
Please retain 100 errors at least for the highest priority ones -- in general there are much less than that, though. -- Laddo 66.131.214.76 (talk) 23:29, 20 March 2009 (UTC)
I think that the script can be configured for each wiki separately, so the French output would be unchanged. –Drilnoth (TC) 23:41, 20 March 2009 (UTC)
At the moment I redesign many things in this script, and I think the idea to make different between high and low pritority is a good idea. -- sk (talk) 12:32, 21 March 2009 (UTC)
Hmm... seems to work now. Thanks for updating it! –Drilnoth (TC) 21:52, 23 March 2009 (UTC)

Automation[edit]

This shouldn't require editing the pages manually. Can't a script be set up to sync the wiki pages with the text files every 12 hours? Seems trivial to do. --MZMcBride (talk) 20:38, 20 March 2009 (UTC)

Good question (and maybe it could be set up to just change every time that the toolserver updates, as the timeframe really varies). –Drilnoth (TC) 20:44, 20 March 2009 (UTC)
Well, if it tries to edit and the content is the same, it just won't save the revision. MediaWiki doesn't allow duplicate revisions to be saved. :-) Would you like me to look into automating this? --MZMcBride (talk) 21:17, 20 March 2009 (UTC)
That would be great, thanks! –Drilnoth (TC) 21:24, 20 March 2009 (UTC)
OMGZ BOT Reedy 01:38, 21 March 2009 (UTC)
...
I guess you mean a bot request is probably the best thing to do? –Drilnoth (TC) 01:44, 21 March 2009 (UTC)
Yeah. Its non contraversial and can be done pretty easily Reedy 23:36, 21 March 2009 (UTC)
Done. Thanks for the tip! –Drilnoth (TC) 23:41, 21 March 2009 (UTC)

punctuation after references[edit]

Can this be done: like cleaning up '<ref>abc</ref>.' to '.<ref>abc</ref>' and similar other mistakes? Thanks.--GDibyendu (talk) 12:44, 21 March 2009 (UTC)

It might make more sense to use Special:Random in conjunction with WP:FORMATTER, which I think fixes punctuation after refs automatically, but this could be a good idea. –Drilnoth (TC) 21:09, 21 March 2009 (UTC)

Dashes[edit]

I have often corrected hyphens to dashes in the situations described at WP:DASH, so this got my attention: "en dash or em dash The article had a dash. Write for –; better "–" or —; better "—"." If "The article had a dash" is an error, then the Manual of Style guideline WP:DASH is a bigger error. You then contradict that thought with the ungrammatical "Write for –; better "–" or —; better "—".", which seems to be telling me to use dashes after all, even though the previous sentence considers "The article had a dash" to be an error. So is this consistent with WP:DASH or isn't it? Art LaPella (talk) 04:22, 24 March 2009 (UTC)

Probably it's a case that the original German explanation hasn't been translated correctly – the Check Wikipedia originates from de-wiki. I agree the wording needs to be clarified. Rjwilmsi 08:02, 24 March 2009 (UTC)
My understanding was: should be replaced with –, and the with —. Other interpretations would not make sense in light of what MoS says. Should be clarified. GregorB (talk) 10:10, 24 March 2009 (UTC)
What GregorB said. I'll see if I can alter the translation page to clarify this. –Drilnoth (TC) 13:22, 24 March 2009 (UTC)

underline[edit]

What's the wiki alternative to <u>? It's listed as a syntax error but I don't know of an alternative. OrangeDog (talkedits) 22:49, 25 March 2009 (UTC)

Template:Underline should do the trick. –Drilnoth (TC) 16:22, 26 March 2009 (UTC)

Title in text[edit]

This isn't an error. Wiki software will automatically render it as Title. Leaving it as a link is easier for future splits and does no harm. OrangeDog (talkedits) 23:05, 25 March 2009 (UTC)

I think that all of those errors is for when the article links to exactly its own title, not to a redirect to it (although I could be wrong). Having bold text in the article without having a good reason is generally discouraged, as random bolding of words could be distracting. –Drilnoth (TC) 16:23, 26 March 2009 (UTC)
Isn't it in WP:MOS or just recommended way of doing it? Reedy 16:50, 26 March 2009 (UTC)
WP:BOLDTITLE says that the article name in the lead section should be bold, but does not prescribe a "correct" way of doing it. If the article name is "Foo", then in the same article '''Foo''' and [[Foo]] work exactly the same, and that's why some people use the second technique in the article intro. So, if I understand correctly, title link in text is not an error in the introductory sentence, but is an error anywhere else in the text. GregorB (talk) 17:47, 26 March 2009 (UTC)
At least in the first sentence, it seems to me that it has always been '''Title'''. -- User:Docu

I was thinking of substituted nav templates (they do exist) and similar. In the main text it's probably wrong. OrangeDog (talkedits) 22:21, 2 April 2009 (UTC)

I am not able to update the page..[edit]

I am getting the following error: Request: POST http://en.wikipedia.org/w/index.php?title=Wikipedia:WikiProject_Check_Wikipedia&action=submit, from 71.231.176.196 via sq16.wikimedia.org (squid/2.7.STABLE6) to 208.80.152.43 (208.80.152.43) Error: ERR_READ_TIMEOUT, errno [No Error] at Sun, 29 Mar 2009 10:01:03 GMT --Anshuk (talk) 10:03, 29 March 2009 (UTC)

Probably related to your connection and the size of the page. Reedy 12:32, 29 March 2009 (UTC)
its the page-size. updated omitting lowest priority --AwOc 12:57, 29 March 2009 (UTC)

maybe there should be a sub-page for each priority. this would also reduce the revision sizes on small edits. updating would then mean to edit four pages, which is quite annoying. maybe a bot could solve this. --AwOc 13:18, 29 March 2009 (UTC)

I am able to update. I only tried to edit a section, rather than the whole page. BTW, in any case, it will be updated in a few hours. So, probably removing entries may not be important. Also, depending on when data collection for today started, some of things that you may have fixed today, may again appear on today's list.--GDibyendu (talk) 13:49, 29 March 2009 (UTC)
The low priority couldn't be added today because of size, either? Weird. It had been working for a while. –Drilnoth (TC) 15:36, 30 March 2009 (UTC)

Errors suitable to be fixed with AWB[edit]

The descriptions can be edited at WikiProject Check Wikipedia/Translation.

To edit a description, copy the text from the default description (desc_script) to desc_enwiki, e.g. for error 1:

 error_001_desc_script=This article has no bold title like '''Title'''. END
 error_001_desc_enwiki=This article has no bold title like '''title'''. END

I think it would be helpful if any element that can be fixed with AWB be marked as such. -- User:Docu

That would probably be a good idea... I'd add it myself except that I don't know what AWB fixes! Anyone? –Drilnoth (TC) 00:52, 30 March 2009 (UTC)
I'll have a go at this later today. Rjwilmsi 11:05, 30 March 2009 (UTC)
Hmm, I got sidetracked. but will work on this over the next few days. Rjwilmsi 22:37, 30 March 2009 (UTC)

I marked some of them. They seem to be the type of changes after which my usual AWB settings skip saving. -- User:Docu

For error 7, I made a feature request at WT:AutoWikiBrowser/Feature requests#Section header level (WikiProject Check Wikipedia #7). It might as well be fixed in any article. -- User:Docu

A clever trick - or not?[edit]

From Bihar:

{{#switch: {{#expr: {{CURRENTHOUR}} mod 1}}
|0 = [[Image:Secretariat Building patna.JPG|left|170px|thumb|Vidhansabha Building, [[Patna]]]] 
|1 = [[Image:Patnahighcourt.jpg|left|180px|thumb|Patna high court, [[Patna]]]]
}}

Opinions? GregorB (talk) 19:20, 30 March 2009 (UTC)

I think that it is an interesting concept, but doesn't really make sense. The article changes based on the time of day? What?! It just doesn't really seem quite right. Something could be brought up at one of the village pumps about this, perhaps a template being created to standardize such pseudo-randomization across a number of articles. –Drilnoth (TC) 21:10, 30 March 2009 (UTC)
A clever trick yes, but per standards, no. If there are multiple relevant images for the article and not enough space to show them all full size then surely a gallery should be used to make them all available to readers. Rjwilmsi 22:36, 30 March 2009 (UTC)
I tend to agree... Although we all accept the notion of article changing through revisions, the idea that an article changes in a somewhat random way, without the underlying content being revised, is a bit unsettling to me. Of course, for the purposes of Check Wikipedia, this should be either done through a template, or not be done at all. GregorB (talk) 11:35, 31 March 2009 (UTC)
I have removed the code and just left the images in the article. –Drilnoth (TC) 12:17, 31 March 2009 (UTC)

Nice changes[edit]

Nice updates, Stefan! Do you know if it is yet possible to include the "lowest-priority" pages in the list? –Drilnoth (TC) 21:10, 30 March 2009 (UTC)

Double pipe in one link -- sometimes the second pipe enhances formating[edit]

Like in the following examples:

Johnny Valentine [[World Class Championship Wrestling|Southwest Sports, Inc. | NWA Big Time Wrestling]]''
Mick Foley [[Extreme Championship Wrestling|Eastern Championship Wrestling | Extreme Championship
Roddy Piper [[World Wrestling Entertainment|World Wrestling Federation|World Wrestling Entertainment]]''
Scott Levy [[World Wrestling Entertainment|World Wrestling Federation '''|''' World Wrestling

What do you think we should do about this?--Anshuk (talk) 08:26, 31 March 2009 (UTC)

It displays as "Southwest Sports, Inc. | NWA Big Time Wrestling". I'd remove it. Reading "World_Class_Championship_Wrestling#Big_Time_Wrestling:_1966-1981", I think "NWA Big Time Wrestling" should do, but one could replace it with a colon. -- User:Docu

working on bot - question.[edit]

I'm working (slowly) on a semi-automated bot to fix some of the simpler problems on this list. I'm curious, though: is there any way to access the full list of found problems (i.e., get around the 'output was limited to 50 articles' issue)? --Ludwigs2 19:05, 1 April 2009 (UTC)

You'd need to ask Stefan... what kinds of problems do you think your bot can fix? A lot of these need human attention. –Drilnoth (TC) 19:06, 1 April 2009 (UTC)
Check http://toolserver.org/~sk/checkwiki/enwiki/
I think AWB can do quite a few of them already, Rjwilmsi might annotate those later (see #Errors_suitable_to_be_fixed_with_AWB). I fixed error 16 by bot and just got criticized at WP:ANI for having done so. -- User:Docu
Yeah, AWB will be able to fix a number of the errors around wikilinks, square brackets etc. I am going to specify in the translation exactly what can be done, and in the longer term work on increasing the range of AWB fixes to handle what's here. There's still going to be plenty of stuff that's manual, like image descriptions, though perhaps AWB could be used to make the process faster. Rjwilmsi 22:07, 1 April 2009 (UTC)
Most template fixes will probably need to be done manually. Anyways, maybe we could just set everything that can be done by AWB to same priority level. Not sure if we could create an additional one (4). Those that are mostly AWB, but need some checks (2) and those that need to be manual to (1). Everything else would be (3). -- User:Docu
oh, there's a lot here that can be done with a manually assisted bot - I was looking particularly at the sections on breaks after list items, regularizing headers and fixing bad <br /> constructions, but with artful use of regex and a little human guidance most of the things here can be streamlined significantly. mostly the bot would automate the boring details. for instance, to regularize headings (now), I need to see what needs to be changed, copy the wikitext into a text editor, run regular expressions or other edits to fix the headings, copy the revised text back into the browser, and save the results. with a semi-auto bot, I could do all of that in one step (just specify the changes and click 'go!'>. some things (like list-breaks) can be fully automated, which is why I was wondering about getting larger segments.
with respect to AWB - I'm a mac user, no joy. Face-sad.svg
well, let me get the thing working, and then I'll talk to stephan about expanding it. no sense putting the cart before the horse. --Ludwigs2 01:23, 2 April 2009 (UTC)
It's not real grand, but my CodeFixer user script can fix some of the errors automatically. –Drilnoth (TC) 01:31, 2 April 2009 (UTC)
Oh, that's useful. --Ludwigs2 02:25, 2 April 2009 (UTC)
Glad to hear you like it. I'm still working on adding some more things to it, but it does a fair bit now (mainly converting those pesky XML and HTML character encodings to be actual symbols... it's — not &mdash;. Anyway, I hope to add some more things to it soon, but please let me know if you have any ideas. –Drilnoth (TC) 02:35, 2 April 2009 (UTC)
well, the only thing that strikes me immediately is the auto-submit aspect: that's great if you're code-fixing some random page, but not so good if you just want to clean up the code in a section you're working on and check it over. personally, I'd rather take the extra step of submitting it myself. maybe change it so there are options - submit, preview, or do nothing on run. I'll give it some thought for more stuff, though. --Ludwigs2 03:15, 2 April 2009 (UTC)
It just clicks show changes; it doesn't actually save the page until you do so manually. If wanted, though, I could add in some configuration to allow you to choose how it should act (diff, preview, continue edit without looking at changes, or save). –Drilnoth (TC) 13:30, 2 April 2009 (UTC)

What the heck?[edit]

Where is everything? Surely there are still errors. –Drilnoth (TC) 01:00, 2 April 2009 (UTC)

sorry, no. all errors on wikipedia have been fixed, both technically and content-wise. in fact, there's really no reason to edit the encyclopedia anymore. Face-smile.svg --Ludwigs2 02:22, 2 April 2009 (UTC)
Oh. Duh. –Drilnoth (TC) 02:32, 2 April 2009 (UTC)
That's good. All this talk above about bots had made me think "Wow... did they do something already?" –Drilnoth (TC) 02:33, 2 April 2009 (UTC)

Face-smile.svg it's back. -- User:Docu

So did Stefan do that, or was that one of you? Regardless, excellent work. –Drilnoth (TC) 13:30, 2 April 2009 (UTC)
It was a bug in some other languages and April 1 here Face-wink.svg It was from a wiki where they do get to zero. - User:Docu
A wiki where they do get to zero. That would be nice. –Drilnoth (TC) 15:39, 2 April 2009 (UTC)
Sorry for my fault. A wiki with zero errors is pdcwiki They have only daily errors (today only 2). When en has this level? You must work harder an faster! :-) -- sk (talk) 08:25, 3 April 2009 (UTC)
EN's never had 0. :( –Drilnoth (TC) 13:00, 3 April 2009 (UTC)

I'm confused :([edit]

I really haven't been staying fully up to date here but... what's going on? There's only a handful of pages listed as having HTML italics/. "Headlines start with three "="" has only 52 results... come on, there's a lot more than that! (or has someone really been going through them?). Headline hierarchy is down to 14. I know that there have been a lot of changes recently, so I'm just wondering... is this on purpose or is there a bug? If it's on purpose, why? Thanks. –Drilnoth (TC) 17:45, 2 April 2009 (UTC)

Check de:User talk:Stefan_Kühn/Check_Wikipedia#6000_issues_solved??.
Which is why I left yesterdays entries there. I think it gives a good idea how much piles up in one day. -- User:Docu
Okay; thanks for the link. –Drilnoth (TC) 20:30, 2 April 2009 (UTC)


Statistics[edit]

I was just wondering how much we covered. Which percentage of the last dump is scanned?

It looks like many of the checks with mid-sized results got down into the 50s range thanks much work. Others keep increasing despite that we also work on these, probably because the underlying sample changes.

Obviously, some of the checks dig up pages with really odd formatting that take quite some time to improve.-- User:Docu

I have no idea... I think that this project is being much more productive than it was, say, a few months ago, thanks to the use of AWB, but with all the new errors there's no easy way to tell what progress is being made. –Drilnoth (TC) 21:05, 6 April 2009 (UTC)
My understanding is that a full English dump was generated March 13 that whould have been fully scanned on March 14 (the analysis took a bit longer on that day). For all detection types that existed at the time, their numbers increase only due to : a) new articles - b) new errors introduced by modifications to existing articles - c) improvements to rules of that detection. For all detection types created since the last full scan, the full dump was never analyzed; error counts of those detections increase daily due to a) errors in new articles - b) modified articles that get scanned for the first time with that detection - c) modifications to articles, introducing new occurrences of that error - d) improvements to rules of that detection. Check the "News" section immediately below the summary table to see what detections got enhanced of changed recently. -- Laddo 66.131.214.76 (talk) 03:44, 7 April 2009 (UTC)
Thanks Laddo. What he wrote is correct. Also I have a mistake 1.April/2.April, where many errors was delete. But at the moment I think there is no reason to make a new scan of the old dump. You have enough errors in enwiki to work. :-) We will wait for the next dump. -- sk (talk) 06:16, 7 April 2009 (UTC)
The counts on many errors seem to confirm this (e.g. 5, 40, 45, 49, 65, 51, 60, 3, 8, 19, 32, 55, 58, 52, 66 all showing mainly new issues). This is good news.
Others, such as the result for check #30 (#Image without description), seem to increase on a daily basis. It's a check that was already there on March 14. If the above is correct that means that either it's frequent in new modifications or its detection rules changed much. I concede that I don't necessarily add a description when it could be equal to "general view of pagename". The results for check number #7 seems to increase in a similar way.
BTW, I'm not concerned that we don't have enough to work on ;), just wondering about the size of the iceberg -- User:Docu

Summary from user talk: The total for one check (e.g. 2000) is not updated, unless:

  • the dump is completely (re-)scanned
  • the fixed items are within the first fifty being scanned on a daily basis

-- User:Docu

AWB and List of all articles with error X[edit]

Is there any way to have AWB properly load the contents of the pages referenced at "List of all articles with error X"?Naraht (talk) 05:17, 12 April 2009 (UTC)

Try this. Note that items in the lists beyond #50 don't get updated on a daily basis (as per #Statistics). -- User:Docu
That usually works for me. –Drilnoth (TC) 12:58, 12 April 2009 (UTC)
Quick question about AWB, I'm trying to work on the ISBN mistakes, but I can't figure out how to search in the text for the string ISBN. ctrl-F doesn't work and it doesn't seem like the massive Find and Replace concept is the way to go.Naraht (talk) 15:12, 12 April 2009 (UTC)
If you're just searching for the sting, isn't there a box in the "start" menu for AWB where you can plug that in? –Drilnoth (TC) 16:07, 12 April 2009 (UTC)
You want the little box, at the bottom, just to the left of the edit window. I continually try to use CTRL-F to no avail, but that works quite well most of the time (press it a few times if it doesn't work straight away). - Jarry1250 (t, c) 16:43, 16 April 2009 (UTC)

Image Description with Small[edit]

Some of these are deliberate, where it isn't the entire description, but rather a part of it that is within angle-small-angle, and Wiki will make that even smaller than the 94% that the rest is in. Is this still an error if Wikipedia is handling it correctly and it is doing what they intended?Naraht (talk) 17:37, 15 April 2009 (UTC)

Well, it makes the text far too small (IMHO) when there are alternatives. - Jarry1250 (t, c) 16:49, 16 April 2009 (UTC)
A difference of opinion over style is not a syntax error. OrangeDog (talkedits) 20:32, 17 April 2009 (UTC)
I'm not sure that the concerns are so much based in style arguments as usability and accessibility ones. - Jarry1250 (t, c) 20:58, 17 April 2009 (UTC)
That still doesn't make them syntax errors, and there's no policy saying you can't use different text sizes in an image caption. Even if there were, people might want to ignore such a rule in special circumstances. OrangeDog (talkedits) 01:40, 18 April 2009 (UTC)
They are welcome to IAR it if they want. Who said it was a syntax error anyway? I certainly didn't. - Jarry1250 (t, c) 10:30, 18 April 2009 (UTC)

Forcing a section update[edit]

I think the Headlines start with three "=" is quite a bit out of date (D6 did a load a fortnight ago). What's the proper method for forcing an updated list to be produced (if possible)? - Jarry1250 (t, c) 12:30, 18 April 2009 (UTC)

To my knowledge there isn't one... you'd need to ask SK. –Drilnoth (TCL) 12:32, 18 April 2009 (UTC)
Sorry for fixing them. ;)
Currently, the full lists are only updated if a new dump is available (#Statistics). If you ask sk, maybe he will slip the 3000 pages of bug #7 into the daily scan for changes.
-- User:Docu
I asked him at de:User talk:Stefan_Kühn/Check_Wikipedia#Check_7_on_en.wp. -- User:Docu
The "List of all articles with error X" will be daily updated, but not complete. With every scan daily this list will be updated. But daily the script scan not all articles in this list. Only the first articles of this list, until it found 50 errors. I think the list is at the moment ok. Maybe my script found more then 50 errors in the new articles. So the list will not go down. But I think it will not help to make a complete scan of the old dump. I will wait for the next new dump. -- sk (talk) 18:59, 18 April 2009 (UTC)
Kein Problem. Now I understand how it works, I won't be surprised in future. - Jarry1250 (t, c) 19:03, 18 April 2009 (UTC)

<unindent>

The result from the old dump would be the same, no? Anyways, for this check, it might be better if new pages were listed once old results are dealt with, e.g. the version of April 12 lists two pages deleted in the meantime. -- User:Docu

Standard sortkeys[edit]

The reports lists the usual "*" sortkey as error (Wikipedia:WikiProject_Check_Wikipedia#DEFAULTSORT_with_special_letters). As it's the standard way to sort defining articles before others in the category, it shouldn't be included. -- User:Docu

Editnotice[edit]

When editing the project page, it now displays an edit notice. I made an initial version. It can be edited at Editnotice-4-WikiProject Check Wikipedia or by suggesting an update below. -- User:Docu

It's now at Wikipedia:WikiProject Check Wikipedia/notice 1 and another one at Wikipedia:WikiProject Check Wikipedia/notice 2. -- User:Docu

Missing opening or closing brackets, table and template markup[edit]

At Wikipedia:WikiProject Check Wikipedia/AWB, there is a series of samples from checks 46, 10, 28, 47, 43.

AWB could repair two of #46 (Square brackets not correct begin) by removing an additional bracket from an external link [5][6]. All others were done manually. Fixes consisted of removing or adding tags. In most cases this was within the highlighted section. In one case, I had to restore from a previous version.

The general steps seem to be:

  • 1. open the page in edit mode
  • 2. search for the extracted section
  • 3. fix it manually
  • 4. save it
  • 5. go on to the next page.

The question is: which is the best tool to do this? Personally, I haven't managed to do this efficiently in AWB. Possibly it could be modified to do steps 1, 2, 4, 5 more or less automatically. -- User:Docu

Have a go with the SVN snapshot of AWB (link from AWB page) which has more fixing logic I added for template and link brackets etc. Rjwilmsi 11:15, 9 April 2009 (UTC)
I'm testing the SVN version. It seems to be dealing with a few of 46 and 10 only, possibly the same as before (BTW quite scary the new redundant reference removal). Probably it's in the nature of the checks that they can't be fixed easily.
For 47, 43, quite a few of the broken templates seem to be cite templates. This is probably why they are not noticed. Table markup seems to be able deal with missing closing tags (28). -- User:Docu
Table markup is still problematic without closings, because it can cause some bugs even if they can't readily be seen. And I agree that most of the errors 47 and 43 are citation templates... I cleaned up about 50 a few days ago, and was amazed by just how many had similar errors. –Drilnoth (TC) 13:01, 10 April 2009 (UTC)

A series of new features are being implemented to help with some of these. I'm looking forward to a new build allowing to test them. -- User:Docu

Awesome; can't wait. –Drilnoth (TC) 12:41, 13 April 2009 (UTC)
They are live in build SVN 4218. I went through the list of check #43 (Template not correct end). Cool! -- User:Docu
If you come across any more common bracket errors that AWB could reliably fix automatically, let me know and I can add them to AWB. I haven't looked at any of the table-related errors yet. Rjwilmsi 18:42, 17 April 2009 (UTC)
Looks like you already fixed quite a few cite templates testing the new feature. I did a few of #47 -- not too many at once, it's a bit consuming  ;).
I'm not sure if more fixes could be automated. Besides that, two minor points come to my mind:
  1. Where there are too many closing braces, it might be worth placing the cursor on the unmatched one or to color them.
  2. If the feature is to work mainly for curly brackets, I'd named them "braces" or "curly brackets".
Tables seem easier to do by hand. In general, they can't be overlooked as the broken (cite) templates. -- User:Docu
Are you using 'Options -> highlight first unbalanced bracket if found'? I'm investigating using colour. Rjwilmsi 20:57, 17 April 2009 (UTC)
I do. The other options I currently use are "Apply changes automatically", "On load: show changes". Watching closely, it looks like the edit box is loosing focus immediately after the cursor highlights the position. -- User:Docu 21:29, 17 April 2009 (UTC)
Finally I used it on some of checks 10 and 46 (square brackets): it works. I updated the description accordingly. Another point I came across: the "Alert box" doesn't update when one uses "re-parse". If one fixes a set of brackets and wants to make sure that all problems are fixed, one needs to save and reload the page several times. It's rare that there are several, but I didn't manage to get through (the markup in) Tibetan sovereignty debate and its many {{quote}}. -- User:Docu 10:14, 18 April 2009 (UTC)
For #47, one error that might be fixed automatically by AWB would be this one. -- User:Docu
Also, ((cite web -> with curly brackets. I'm liking the improvements, however. - Jarry1250 (t, c) 18:40, 18 April 2009 (UTC)
rev 4229 Both done. Rjwilmsi 21:38, 18 April 2009 (UTC)
Good, I'm thankful for each I don't have to fix myself.
If I remember right, for this one, AWB suggested adding curly brackets instead of removing the ones (SVN 4218).
BTW my edit box still keeps loosing focus. -- User:Docu
Yep, no bleedin' idea why that loss of focus is happening. It's on my TODO list but I have higher priorities. I'll have a look at braces behaviour on that revision of Vistula Veneti later. Rjwilmsi 17:53, 19 April 2009 (UTC)
I thought it might have been limited to mine. Anyways, it still works. We did get to the bottom of the 43+47 (some nasty ones remain) and, even better, some of the new entries get fixed directly. -- User:Docu
rev 4233 Resolves Vistula Veneti issue - AWB now just highlights as unbalanced. Rjwilmsi 22:11, 19 April 2009 (UTC)
At Sri Lanka Indo-Portuguese language (this version), AWB should probably skip the sequence as the page now includes <code></code> [7] (SVN 4218). I added {{nobots}} to Comparison of programming languages (object-oriented programming) as it doesn't have such tags yet. Besides that, the fixes for check #10 went well. -- User:Docu
I had allowed for 'nowiki' and 'math' tags but not 'code'. rev 4242 for that. Rjwilmsi 18:06, 21 April 2009 (UTC)

Check 59: Template value ends with break[edit]

Shall we try to fix these (1), keep the report running in case someone is interested (2), or deactivate it (3)? I'm a bit hesitant.

Today (April 19) the report lists 2714 occurrences. As the check was added after the last dump (mid-March), I assume that there should be more to come.

The displayed value doesn't change in MediaWiki, but, e.g. for templatetiger, the available data would be cleaner without. -- User:Docu

I'd go with (2). It doesn't seem like its as pressing as some of the other errors, but it is still an error which someone can fix when the more important problems are resolved. –Drilnoth (TCL) 12:37, 19 April 2009 (UTC)
To start to fix them, the current result will still be found in the page's history. We could also save the full list from toolserver. In the meantime, if it isn't handeled, we could just de-activate it, possibly re-activate it just before the next dump. With some work, it should be possible to use pywikipediabot to do most of them. -- User:Docu
Okay; I don't really care either way. –Drilnoth (TCL) 14:01, 19 April 2009 (UTC)
Lets see what others say. If there is no demand, we could turn it off. -- User:Docu

Headline hierarchy[edit]

Hi! Are you really sure the headline hierarchy is a problem? I mean: does it generate any real problem? In my opinion, due the very small difference in font size between the level 1 and level 2 headlines, many users sometimes just choose to use the level 3 in place of 2 in order to obtain a better layout and a clearer structure within the article. Is there enough consensus about this level gap ban? -- Basilicofresco (msg) 07:46, 10 March 2009 (UTC)

IMO, I think that when a headline jumps from level 2 to level 4, it looks pretty ugly on the screen. Also, the MOS states (at WP:MOSHEAD) that "primary headings are then ==H2==, followed by ===H3===, ====H4====, and so on." So, no, it doesn't generate any real problem (it's not going to destroy the wiki or anything!), but there is community consensus and (I believe) it's been that way for a long time. –Drilnoth (TC) 17:27, 10 March 2009 (UTC)
See Organizing a page using headings at the Web Content Accessibility Guidelines 2.0 (11 December 2008). --Red Power (talk) 15:09, 2 April 2009 (UTC)

See also: User_talk:WWGB#Headline_levels_on_Deaths_in_January_2009_etc -- User:Docu

  • sigh* What reason can there possibly be for opposition? It's in the MOS. Although the MOS is not always correct and is a guideline, I see no reason why this should be any different from other articles. –Drilnoth (TC) 13:29, 4 April 2009 (UTC)
  • lol... a bit of insight for you: the phrase 'there's an exception to every rule' really means that every rule has someone who takes exception to it. Face-smile.svg --Ludwigs2 19:29, 4 April 2009 (UTC)
Heh... –Drilnoth (TC) 19:31, 4 April 2009 (UTC)
Well the whole set of monthly pages is formatted that way (at least since 2006). I understand their explanation about how they are "growing" them. Inserting additional headers is probably a good way to keep them more or less the same.
My bot did several hundred of #7 (WP:WikiProject Check Wikipedia#Headlines start with three_"=") on articles that didn't have a "==" level, but this didn't quite reduce the numbers (the stats aren't simple to read). If we want the others to be fixed, maybe we need to have the TOC feature "changed" as it currently adjusts automatically for some of the headers on the wrong level. -- User:Docu

I have fixed manually more than 50 of these recently but the number just keeps growing. Maybe the script could be improved to create a new list about articles with new headline hierarchy issues, i.e. articles that previously did not have an issue but that have the issue now. Then the users who create these issues could be tracked and educated. This would be applicable to the other checks, too. —ZeroOne (talk / @) 08:29, 22 April 2009 (UTC)

The ones displayed are generally the ones in articles recently changed (not only to add a specific error though).
The total doesn't necessarily change from one day to the other even if you fix most of them (see #Statistics). On the other side, e.g. today check #7 dropped by 134 even though we probably didn't fix more than 50 yesterday, but the ones the script had stored to rescan next were already done in the days before.
At least for check 7, some of these are easily fixed, as sometimes all levels are just one off. I don't think it's helpful to lecture contributors that created their first article about a somewhat minor point while there is much more to be done. Once it's fixed in an article, eventually they will figure it out. If it can be fixed by bot, it wasn't much effort for me either. There articles where the structure is due to the growth of the article, possibly by numerous contributors and someone has to try to adjust the structure at some point. More tricky are articles that use a predefined structure made for longer articles. At least for these, one has to find an intermediary solution of some sort and we wouldn't want to stop their growth just to reach a perfect TOC. BTW a numbered TOC adjusts for most cases. -- User:Docu


Check 30: Image without description[edit]

Shall we keep this running or turn it off? -- User:Docu

I can go either way with this one... one one hand it kind of is an error, but on the other it isn't really a "syntax" error, but a "content"/"style" error. Whatever we do, the "image gallery without description" check should be the same. –Drilnoth (T • C • L) 22:32, 21 April 2009 (UTC)
Image gallery is just 394, but this one is 11,260. It seems to rise fast since the last dump (/old, not sure if the script changed). If it's being used, I don't mind having it. -- User:Docu
I have changed massively the script for error 30. Now my script will detect really more errors. I think this is one of the important errors. No bot can fix this. Here we need manpower. A bot can only inform at the discussion page that there is an image without description. We make this in dewiki. -- sk (talk) 19:17, 22 April 2009 (UTC)
If it doesn't slow down the scan for the other errors, let's leave it running. I added the "new" tag to the results. BTW one could try to import the descriptions from Commons .. -- User:Docu
I saved the current list at /030, in case someone wants to work on it. -- User:Docu

Extraneous links in hatnotes[edit]

Hatnotes should only contain links to the desired possible other target. See Wikipedia:HATNOTE#Extraneous links. Would need human review probably, as there may be exceptions. –xeno talk 16:31, 22 April 2009 (UTC)

Ideally, yes, templatetiger can give an overview of things. First, it might be worth trying to convert notes that don't use {{dablink}}, {{otheruses}} and the like to one of the templates. -- User:Docu

Reference[edit]

Some articles dont use <references /> but {{Reference}} which is the same as {{reflist}}. Kwiki (talk) 06:59, 10 May 2009 (UTC)

It looks like this makes appear Cynthia L. Bauerly on the list for check #3 - Wikipedia:WikiProject Check Wikipedia#Reference tag .3Creferences .2F.3E missing (partial AWB) - where it shouldn't. There are several other redirects at Special:WhatLinksHere/Template:Reflist. -- User:Docu
I left a note for Stefan (de:User talk:Stefan_Kühn/Check_Wikipedia#Check_3_at_en.wp). They should be gone after the next update. -- User:Docu

New dump?[edit]

I take it the +179,000 bytes equates to the new dump having been scanned through? - Jarry1250 (t, c) 18:25, 15 May 2009 (UTC)

Looks like it:
http://download.wikimedia.org/enwiki/20090512/
2009-05-14 23:57:24 done Articles, templates, image descriptions, and primary meta-pages.
2009-05-14 23:57:23: enwiki 8521847 pages (99.613/sec), 8521847 revs (99.613/sec), 77.1% prefetched, ETA 2009-05-16 15:45:17 [max 22793793]
  • This contains current versions of article content, and is the archive most mirror sites will probably want.
  • pages-articles.xml.bz2 4.8 GB
I was wondering how much was still to come, but I'm still surprised. -- User:Docu
If we edit one article per minute, we will be done in 4 months .. -- User:Docu
Heh...
I have DrilBot on "headlines end with colon". –Drilnoth (T • C • L) 19:07, 15 May 2009 (UTC)
Starting from the back of the list since I see D6 got some of them already. –Drilnoth (T • C • L) 19:12, 15 May 2009 (UTC)

No, this is the old dump from 2009-03-13 01:27:21. I have start the scan of the old dump 3 days ago. -- sk (talk) 19:17, 15 May 2009 (UTC)

Oh wonderful. We have that to look forward to. - Jarry1250 (t, c) 19:20, 15 May 2009 (UTC)
Wow. The new scan will register changes since this one, right? –Drilnoth (T • C • L) 19:23, 15 May 2009 (UTC)
I'm guessing, but I think that the jump is because this is the first dump that a whole bunch of new / updated checks are being run on (as opposed to just edited). So probably not quite so many more. - Jarry1250 (t, c) 19:29, 15 May 2009 (UTC)
I think that's how it works. It's a great script regardless; thanks sk! –Drilnoth (T • C • L) 19:48, 15 May 2009 (UTC)
It would be just another two or three weeks of errors (dump is mid March, most checks were in place at the beginning of April). Luckily it's the weekend, traffic is low [8] and the server doesn't lag [9] (as of 02:49, 16 May 2009 (UTC)). -- User:Docu
Drilnoth, do you think we could talk Jarry into running another bot to crunch through the lists? -- User:Docu
I could manage an AWB bot, certainly. There's a python one going through the approvals process now, but I guess, DrilBot's in the best position for expandability. Sorry, I don't really feel like coding much more than merely setting AWB to stun mode. - Jarry1250 (t, c) 08:13, 16 May 2009 (UTC)
Given the amount of pages to process, I think it would be worth it. Ideally, we would try to process most pages before the next dump (in June probably). Given the way the script works, it's unlikely that we will get updated full lists before.
BTW the unicodify function on python (converts many) should probably be harmonized with the one in AWB (has a few exceptions). Ideally, the selection in the script sk is using should be similar (could shorter though). -- User:Docu
I hope to plug in some improved unicodification manually, although I agree that having it be default in AWB (maybe something like Wikipedia:AutoEd/unicodify.js's changes?) would be better. –Drilnoth (T • C • L) 10:52, 16 May 2009 (UTC)

Check 19 (Headlines start with one "=")[edit]

Usually there were just a few new articles listed. Now there are 2535. It's probably worth fixing these by bot, lowering all headers by one level. -- User:Docu

When I go through these it seems like quite often there will be a page where the headers are one level high for about half the article or something and then be accurate... fixing those by bot would create a page just as incorrect as the previous one. –Drilnoth (T • C • L) 10:51, 16 May 2009 (UTC)
Some are broken in odds ways others are just all one level off. I suppose one would have to check first if there is more than one header with level "=". The good thing is that some have already been fixed since March ;) -- User:Docu

Page update[edit]

18:52, 17 May 2009 (UTC) I'm trying to update the page, but it keeps timeing out. -- User:Docu

Seems like you got it. :) –Drilnoth (T • C • L) 20:06, 17 May 2009 (UTC) Oops, I'm blind. –Drilnoth (T • C • L) 20:08, 17 May 2009 (UTC)
Finally. I wonder if it's some new abuse filter that slows it down so much -- User:Docu

Title linked in text[edit]

Can ↑ be done reliably by bot? I'm asking because it was brought up at Wikipedia talk:AutoWikiBrowser/Bugs#Bold names and it seems that making this change on image maps could be problematic, as discussed here. Are there any other times that this could be a bad edit? –Drilnoth (T • C • L) 12:53, 18 May 2009 (UTC)

The one in the image map looks more or less what it should be doing (though one could add an exclusion for <imagemap>).
His is complaining that the self link on the image at 50000_Quaoar#Size is being removed. Personally, I think it even more important to remove these confusing ones, than fixing the usual ones. Anyways, in general, there is always a trade off between fixing 1000 and possibly breaking a few. -- User:Docu (15:32)
Hmm .. the missing link on http://en.wikipedia.org/w/index.php?title=50000_Quaoar&oldid=290470688#Size does make the image disappear completely. Too bad the extension has no maintenance category associated with it. -- User:Docu

It was fixed at 15:32. -- User:Docu

Excellent. –Drilnoth (T • C • L) 15:54, 18 May 2009 (UTC)

Double pipe in one link[edit]

I've been through a few of these with AWB. A lot of them are of the form [[article || text]] or [[article|text | ]]. This kind of error looks eminently like bot-work to me; is there one active, or could one be modified to suit? Mr Stephen (talk) 18:05, 18 May 2009 (UTC)

I might be able to have DrilBot fix the type that you mentioned, although a lot would need to be done by hand. I'm not entirely sure though; I'm still not good enough with RegExp. –Drilnoth (T • C • L) 20:20, 18 May 2009 (UTC)

Ignore articles tagged for deletion[edit]

Is this possible? Would it slow down the scan? More importantly, do we want it? Would it be beneficial to not "waste" time fixing articles that are later deleted, or is this an important service - letting people see the content of the article as it was intended to be displayed. Discuss. - Jarry1250 (t, c) 13:30, 10 May 2009 (UTC)

I guess it's probably possible. It's a good question. For some reports it was a bit annoying to get flooded with new articles likely to be deleted while there were many old ones that needed fixing. There are some I skipped, others I reformatted - to avoid seeing them for another week in the reports. -- User:Docu
What Docu said for the most part. When I'm using a tool like AWB or AutoEd I usually just fix the error, but if something is being done by hand it seems kind of pointless. –Drilnoth (T • C • L) 14:53, 10 May 2009 (UTC)
Not all the articles tagged will end up deleted, though, right? Even so, I guess it's reasonable to wait until they're untagged, just in case. --Auntof6 (talk) 03:39, 26 May 2009 (UTC)

44: Headlines with bold[edit]

Treating this as an "error" is brain-damaged. There are plenty of valid reasons why bold could appear in a headline, for example mathematical notation often relies on fixed typefaces. Automatic removal of bold tags as in this edit is dead wrong. — Emil J. 10:50, 19 May 2009 (UTC)

Hmm... well, it still would only matter in level 2 headlines since headlines of level 3 and below aren't visually affected by having the bold text. –Drilnoth (T • C • L) 13:46, 19 May 2009 (UTC)
Frequently, it looks a bit like bold text within text that is already bold, or italics and underscores combined.
In general, italics seem a viable option for additional emphasis within headline. In the sample above, <math></math> seems a good option as well. -- User:Docu

Numbers are low[edit]

I just signed up for this page and I will start going through them as well but I do have a couple comments. First I think some of the numbers are low. For example I know that there are more pages than listed here with incorrect breaks (i.e. <br>, <BR>, <br., etc. I also know that there are more pages with incorrect characters or invalid formatting in the defaultsort. Not criticizing because I am glad that someone created this list but wanted to let you know. My next comment is based on the rather minimal impact of some of these edits such as the breaks. I personally follow the belief that if you watch the penny's the dollars will mind themselves (Even the small edits are important over the long term) but some would argue that some of these edits are a waste of resources and fill up editors watchlists (also not a problem for me personally). Since AWB specifically requests that some of these edits such as the breaks not be done with AWB as standalone changes are we ok to go ahead and do them?--Kumioko (talk) 20:30, 19 May 2009 (UTC)

(de-capitalized header) I think that the list of breaks is about correct... things like <br> and <BR> are correct; the list here only has those which have an error like <.br>, <<br>, or <\br>. My bot ran through the defaultsort list a couple of days ago and fixed a lot of them, although it looks like the list might not have taken that into account yet... weird. Anyway, my feeling is that the things that don't really change much (e.g., the location of categories) and which can be done by bot should be done by bot... then it doesn't waste human resources and the edits don't show up on watchlists. –Drilnoth (T • C • L) 20:47, 19 May 2009 (UTC)
Kumioko, would you have samples of pages that were missed? The reports keep getting improved, but they are not meant to be exhaustive (at least for now ;) ). -- User:Docu
Drilnoth, when you have a moment, would you run your bot through 43/47 (broken templates)? It's easier to look manually at the remaining ones once these done. For an update to date list of what remains, we might have to wait for the next dump though. -- User:Docu 11:00, 20 May 2009 (UTC)
Sure; I'll start it running right after the next update. –Drilnoth (T • C • L) 12:38, 20 May 2009 (UTC)
Before the next dump (in June supposedly) would be sufficient (hopefully in June we wont get April data ;) ). -- User:Docu
For 43, thanks for doing a first pass by bot. I just finished today's 50. Luckily one page filled half the list ;) -- User:Docu
You're welcome; thanks for mentioning it. I don't think that DrilBot can really do much with #47... AWB can't pick up very many of them to fix automatically. –Drilnoth (T • C • L) 15:18, 22 May 2009 (UTC)
As I have to stop at each page to the check it manually, it's helps if the automatic ones are gone. Besides, too many of these at once, give me a headache. -- User:Docu
I read your previous note too quickly. You mentioned the other report. It does also work for #47 (missing opening brackets), see rev 4229 mentioned in /Archive#Missing opening or closing brackets, table and template markup. When editing with a new release, I have actually seen it being fixed! -- User:Docu
I know that it does work automatically some of the time... the problem is that there aren't enough that AWB can auto fix, and when I'd had my bot going through the list it was making a lot of edits that didn't fix that error. –Drilnoth (T • C • L) 14:11, 23 May 2009 (UTC)
If I'm not sure which type of problem AWB will fix, I'm just using "clean-up" as edit summary. It happens sometimes that I forget to switch it from a more specific one. For check#47, you could run it with a summary "Clean-up, general fixes (batch #47)" this might be sufficiently descriptive for the type of operation. If it's done just once for each dump, I think it's acceptable. One could also link general fixes to WP:GENFIXES, this way interested editors can easily find the full set of possible fixes. -- User:Docu 06:09, 24 May 2009 (UTC)
Eh... I'd do that except for the AN/I report about DrilBot, with the consensus being that more descriptive edit summaries are needed. –Drilnoth (T • C • L) 16:32, 24 May 2009 (UTC)
It's not a coincidence that I wrote the above. Anyways, WP:GENFIXES is very descriptive (IMHO), maybe you could even copy it to a separate page. The problem is that if the edit summary is too descriptive and the edit doesn't make the change that is in the summary, it's more problematic. I suppose it would be possible to set AWB to have changes that trigger edits (with a specific summary) and add all other gen fixes behind. Whatever solution you choose, after each 20k of edits, you will get a new thread on ANI ;) -- User:Docu
35k edits. :) I'm working on User:DrilBot/Summaries to create a more descriptive guide on these edits. –Drilnoth (T • C • L) 17:00, 24 May 2009 (UTC)

Non-editable and unreliable (check 7)[edit]

Heed this edit! I will be back to do more of these later. Can someone supply the secret contact information mentioned in the edit summary? Michael Hardy (talk) 11:39, 24 May 2009 (UTC)

The list on the server isn't completely up to date, from the introduction "# The number of items on this page is limited. For a longer list see tools:~sk/checkwiki/enwiki/. These aren't updated daily though. When working on the toolserver lists, it is suggested to start from A. The next day, the script will use these items to re-generate this page omitting already fixed articles.". An estimated one third is already fixed. I'm not quite sure when he will be doing it, but the list will be split into two separate ones (the ones without level "==" headings and the ones with, according to (de:User talk:Stefan Kühn/Check Wikipedia#Check 7 at en.wp)). -- User:Docu

I wasn't worried about missing items from the list, but about items on the list that shouldn't be.

Why is Bell polynomials on the list? Someone edited it to change some subsections to first-tier sections. I hit the "rollback" button. They were intended to be subsections. Michael Hardy (talk) 12:11, 24 May 2009 (UTC)
In fact, I fixed the error on the page more than a month ago myself diff. Items on the toolserver lists date from the scan done 10 days ago of the March 2009 dump. Items are rescanned every day to generate a list of 50 current items. That means the items here are still not done as of yesterday. Obviously this system works better for lists where we don't have that many open. -- User:Docu
Well, someone "fixed" it again today and I reverted to the "unfixed" version and I'll have to do that as many times as someone does that same "fix".
So is it strictly forbidden to have a subsection in the initial section that has no main header? Michael Hardy (talk) 17:53, 24 May 2009 (UTC)
Per WP:MOSHEAD, headlines should start with "==" with subsections being "===", "====", and so on... so consensus is against having a subsection in the lead of the articles. –Drilnoth (T • C • L) 18:21, 24 May 2009 (UTC)
Michael Hardy, I saw the re-"fixing", someone was a bit in a hurry, I'm glad you repaired it.
The article title is a level <h1> heading, thus the next lower level and first level in article text, would be a <h2> level ("=="). The peculiar thing about Wikipedia is that the lead section doesn't have a header which is somewhat asymmetric. Interestingly even on the Main page, they managed starting with h1 and going to h2! -- User:Docu 18:25, 24 May 2009 (UTC)

This check is also broken for special pages like disambiguation pages, where smaller headers are appropriate. This check needs to be eliminated or seriously refined. —Centrxtalk • 18:12, 24 May 2009 (UTC)

Why should it be different for dabs? –Drilnoth (T • C • L) 18:21, 24 May 2009 (UTC)
Disambiguation pages are often short, and there may only be a couple of entries in each section. The ==-level header, which also adds an underline, is far too prominent for disambiguation pages. The section header should not be the same size, or half the size, as the entire section. The standard promulgated by this Check may be theoretically best for articles, but not for many different types of other pages. —Centrxtalk • 18:28, 24 May 2009 (UTC)
Also, the script or bot or logic that was automatically promulgating this Check, is additionally broken in at least three ways. —Centrxtalk • 18:33, 24 May 2009 (UTC)
Which are they? -- User:Docu

Start copy from User talk:Centrx.

Inspection reveals two major classes of page where this Check was automatically implemented: a) disambiguation pages, where a lesser section header was specifically intended; b) blatantly non-wikified pages, that need far more help than tweaking section headers.
Also, without exception, the bot did not even implement the Check correctly. It does not normalize section headers, it simply chops off one =. For example, ==== is reduced to === even if it is supposed to be == at the top level.
This analysis does not even enter the situation of general bugginess, as evidenced by the fact that it eliminated correct sub-sections in [10]; and other problems and objections on User talk:PigFlu Oink. —Centrxtalk • 19:11, 24 May 2009 (UTC)
Bell polynomials is clearly wrong as I had to do it manually myself. It would be interesting to know what caused it. a) isn't incorrect as MoS does warrant level "==" headers. I don't see a problem with b) as such articles never get fixed in one step. "====" to "===" happens with AWB too. -- User:Docu
Looks like it chopped off a level from the first level "===" and below headers it found. Not good. -- User:Docu
  • General MoS does not apply to special cases in special pages. Even if the disambiguation page headers are incorrect, the proper correction is to change them to mere Bolded title as used in Wikipedia:Manual of Style (disambiguation pages), not to change them to grand section headers.
  • Depending on the page, tweaking a broken page either means 1) actual errors are obscured by making the page look superficially correct, but still be wrong header-wise (e.g. [11]); or 2) little is lost by incidentally reverting a page that needs to be majorly fixed by the first editor to come along anyway.
  • Unless someone will manually inspect 15000+ edits to save a handful of sound corrections, reverting all is the only way to correct the mistakes caused by an unauthorized, blatantly buggy bot. —Centrxtalk • 19:38, 24 May 2009 (UTC)

End copy from User talk:Centrx.

See Also => See also[edit]

For consistency, "== See Also ==" should be capitalised "== See also ==" Tabletop (talk) 02:42, 26 May 2009 (UTC)

Category duplication (check 17) due to template that includes a category[edit]

A lot of the asteroid articles have duplicate categories because they're stubs, and the stub template includes a category that's also hard-coded. To me, it's not obvious that the stub templates include the categories (I had to research it a little). Therefore, I'm inclined not to resolve these so that future removal of the stub templates won't remove the category altogether. What say ye? --Auntof6 (talk) 05:13, 26 May 2009 (UTC)

The reports don't cover this, as far as I know. The category needs to defined twice in the article itself, e.g. Category:Main Belt asteroids in 10515 Old Joe [12]. -- User:Docu
OK. Maybe the ones I spot-checked already had the duplicate removed, then. I just went through all the ones that start with numbers and did find some with the category specified twice (and fixed them, of course!). I'll wait and see if they're still there if/when the list gets regenerated. --Auntof6 (talk) 08:27, 26 May 2009 (UTC)

Suggestion: stub templates on articles that are not stubs[edit]

AWB seems to remove stub templates from articles that are over a certain size (I don't know what size that is, though). Maybe this project could generate a list of non-stub articles that have stub templates. --Auntof6 (talk) 09:32, 26 May 2009 (UTC)

Database_reports has a weekly Long_stubs report. -- User:Docu

Error 2 incorrect entry[edit]

[edit] Article with false
(AutoEd)
...deleted...
article info
Comparison of layout engines (Non-standard HTML) * <tt id="trident_wbr"><wbr>Naraht (talk) 15:13, 28 May 2009 (UTC)

I cleaned that up so that it shouldn't be listed in the future. It should have been using &let; and > tags originally, anyway. Thanks! –Drilnoth (T • C • L) 15:12, 28 May 2009 (UTC)

May 28 update[edit]

It seems like the toolserver problems canceled the update. -- User:Docu 00:15, 29 May 2009 (UTC)

Appears so. Ah, well; maybe tomorrow. –Drilnoth (T • C • L) 01:54, 29 May 2009 (UTC)
Apparently, after 9 hours, it did finally get through. BTW we are down
from "342,337 ideas for improvement in 304,586 articles" (May 15)
__to "257,460 ideas for improvement in 233,446 articles" (May 28).
Even if many of the articles of the March dump scanned on May 15 were already fixed by May, I think we made substantial progress. -- User:Docu
Woah; that's a nice statistic. Although wasn't the "headlines start with the '='" check split into two? I'm guessing that that reset those counters until the next database dump. –Drilnoth (T • C • L) 13:12, 29 May 2009 (UTC)
I just hope the new dump wont be as old as the last one. BTW 7 and 83 combined should still equal the old 7. The current distribution between the two isn't "accurate" yet, as the server lists are checked to see which pages still fail the new check 7 or go under 83. Anyways, I think it's quite encouraging. -- User:Docu

Automated edits to heading hierarchies[edit]

I've opened an RfC with regards to whether the community stand by the MoS, but mainly about whether it should be enforced using automated edits. Cheers, - Jarry1250 (t, c) 14:53, 30 May 2009 (UTC)

Grand![edit]

Another huge database dump! :) I'm guessing that this is a more recent one? –Drilnoth (T • C • L) 19:36, 1 June 2009 (UTC)

Or... did it just rescan the same dump with the new errors? DrilBot is finding a lot of already-fixed articles with #53. –Drilnoth (T • C • L) 20:45, 1 June 2009 (UTC)
WTF? Locos ~ epraix Beaste~praix 21:00, 1 June 2009 (UTC)
#16 was mostly ok. It might be the dump from 2009-05-24 21:39:17. -- User:Docu
That sounds like it could be accurate... I think that DrilBot ran through this specific list after that. –Drilnoth (T • C • L) 21:55, 1 June 2009 (UTC)
Soon will be up2date, nothing left to do! -- User:Docu
But that's a good thing, isn't it? :) –Drilnoth (T • C • L) 22:15, 1 June 2009 (UTC)
I'm seeing something similar with #17. You sure it didn't scan an old dump? I've been working through that list, and the areas I've already worked on (numbers through J) suddently have a lot of articles again. --Auntof6 (talk) 22:14, 1 June 2009 (UTC)
It was the new dump. I have change my script after the new server make every 4-5 days a new dump of a language (de, fr, ...). This was too fast for my script. Now a script start at 1., 8., 15. and 22. of a month and search for new dumps and scan this. So maybe all 7 days we use a new dump. The new en-dump is from 2009-05-24 21:39:17 and my script found at 2009-06-01 the new dump and scan this. -- sk (talk) 07:47, 3 June 2009 (UTC)
Okay; thank you for clarifying. –Drilnoth (T • C • L) 12:30, 3 June 2009 (UTC)

More contributors[edit]

Today I have the idea to recruit more contributors. Under Wikipedia:WikiProject_Wiki_Syntax#Thank_you_to_contributors we found many Wikipedians, who help in this old project. Maybe they don't know the new "WikiProject Check Wikipedia". Some of you can inform this people at the User discussion page. What do you think about this? -- sk (talk) 08:05, 3 June 2009 (UTC)

Hmm... neat. I might be able to use AWB to send all of them messages about this. –Drilnoth (T • C • L) 12:32, 3 June 2009 (UTC)

Add logic to AWB to include more of these errors.[edit]

I have been looking at the errors and have noticed that there are several that I think could or should be added to AWB as general fixes or at least as a custom module. Before I go adding a bunch of feature requests though does anyone have any thoughts about which ones they feel could or should be added to AWB?--Kumioko (talk) 18:57, 3 June 2009 (UTC)

I'd say, just go ahead and request them. Note that "reference duplication" has already been requested, as has "template with Unicode control characters". –Drilnoth (T • C • L) 15:40, 4 June 2009 (UTC)
OK and I think they added a few others along with the change from using a text box to a richtextbox (to make and display the edits). Maybe I'll wait till the next version comes out before I make the suggestions.--Kumioko (talk) 17:42, 4 June 2009 (UTC)
You could just download the most recent SVN build; that's what I do. –Drilnoth (T • C • L) 17:44, 4 June 2009 (UTC)
Handy link (that's unless you want to hook into the SVN directly). - Jarry1250 (t, c) 17:48, 4 June 2009 (UTC)

Reactivated errors[edit]

I have reactivated the following errors:

  • #30: Image without description and #35: Image gallery without description. These are both errors because they cause accessibility issues, e.g., for people using screen readers. I can't really see any reason why they should be deactivated, as the vast majority should have descriptions per W3C standards.
  • #36: Redirect not correct. Many of the redirects listed here work, but they use improper syntax (e.g., a carriage return between #REDIRECT and the target page's name). I believe that AWB can be configured to fix these, so a bot can repair them easily and those won't require human attention. The redirects on the list that are malfunctioning can then be fixed manually with ease.
  • #79: External link without description. "Bare links" should almost never be used... they should have some description to help indicate where they lead to. I'm not sure why this was turned off.

Reactivating these may increase the amount of time that the scan takes, but I don't think that that really matters... it will still be daily, just at a different time of day. If you think that any of these should be deactivated again, please don't hesitate to post here. –Drilnoth (T • C • L) 16:39, 6 June 2009 (UTC)

For 2, I agree with you completely. As for 1 and 3, the question here is about whether they are worth listing for now. They invariably involve large numbers of articles which need to be fixed by hand. Not all images need descriptions by the way. We're never going to get through them all, so why both listing them? I mean, that a pessimistic view I know, but seriously, 30,000 images? That's a hell of a job. - Jarry1250 (t, c) 16:57, 6 June 2009 (UTC)
I'm kind of torn myself on point 1... the way I see it, saying "we're never going to finish them, so why bother?" doesn't really makes sense... the question is more like "How many of these don't need descriptions? And is this a good job to have on CHECKWIKI, which primarily lists cosmetic and code changes, not content problems?" Ditto with the external links. I thought that they should maybe be reactivated and we can see what they come up with before making a final decision. I guess that I'm not really opposed to having them deactivated—just kind of neutral on it—for the reasons that I outlined (is it a good job for CHECKWIKI?). Feel free to re-deactivate them if you'd like; I won't oppose it. –Drilnoth (T • C • L) 17:02, 6 June 2009 (UTC)
One idea for #30 and #35. Many of this images without description are flags. I think it is no problem to create a bot who check this images and find flags like Flag of Germany.svg and create a description like Flag of Germany. In dewiki we have 40000 articles with minimum one image without description. 5000 of this 40000 are flags. I have try to get a bot, but at the moment nobody had time. Maybe in Enwiki someone can create this bot. If he use all flags of commons in this commons:Category:SVG_flags this will help. If the #30 activated then he can also use this file. -- sk (talk) 19:46, 6 June 2009 (UTC)

Sharing regexes[edit]

I have been building a new AWB plugin-system thing, designed at making it possible to subscribe to blocks of regexes and to collaboratively work to improve them. I'm calling it FRONDS at the moment (a working title) and you can read all about it at Wikipedia:AutoWikiBrowser/Fronds. I'm designing it (in the broader sense of the term) with CheckWiki in mind. See what you think: expand that page, or put questions on the associated talk page. Cheers, - Jarry1250 (t, c) 19:23, 9 June 2009 (UTC)

(I edited the above to avoid WP:TLDR.) A beta's almost ready now, and it'd be nice to get some regular expressions in the system. I'll be adding mine today. - Jarry1250 (t, c) 10:07, 12 June 2009 (UTC)

No new updates for a few days[edit]

See [13]. I hope that that can be fixed easily enough. Anyway, there's quite a lot to work through here already. –Drilnoth (T • C • L) 15:56, 12 June 2009 (UTC)

# 81 Reference duplication[edit]

I went through this category and I did most of this one but there seemed to be a lot of false positives so the number still isn't at zero. You might want to rerun the list using your script and see what's left.--Kumioko (talk) 18:43, 3 June 2009 (UTC)

If you found a false positiv, then please write the article title here. So I can check this. -- sk (talk) 10:13, 4 June 2009 (UTC)
Ok I will do that next time I go through.--Kumioko (talk) 12:42, 4 June 2009 (UTC)
I couldn't find one for 81 but I just found one for 19. European grid is showing on #19 as having only 1 = for a section but the only rogue = I could find was related to a math calculation.--Kumioko (talk) 13:13, 4 June 2009 (UTC)
See this change. If my script found a "=" at the first position of a new line, then it think it is a headline. Normaly a "=" should not be at the first position. -- sk (talk) 20:26, 4 June 2009 (UTC)

Well it may be the convention to avoid 'reference duplication', but it's a great advantage for readers not to have to scroll down and back up again, specially where there are relevant quotations within the refs. Readers like me, that is, who might want to get an immediate idea of the field of reference that's being used in the article. If I wanted to buck the convention a) would you allow it and b) how would it be done manually?i.e is the question about reference duplication simply a matter of duplicating the numbers and can I place the references where I think is sensible but using one continuous number sequence? I hope this is the right place to put this message: if not, please redirect to the right place! Dungur (talk) 09:15, 16 June 2009 (UTC)

Add details to the different error sections[edit]

As this project grows and we add more and more users and errors I think it would be good if in each of the error sections we give a link (where possible or practical) to the reference in WP that identifies the format or error. I know what some of them are but before I start making a large change like that I wanted to mention it here first.--Kumioko (talk) 18:54, 3 June 2009 (UTC)

Agreed; I'll try to do this when I have the time. –Drilnoth (T • C • L) 15:39, 4 June 2009 (UTC)
Can we especially get the link for error 78 (reference list duplication)? After I took care of those, an editor said he'd put <references/> into individual sections of Shamanic music on purpose, to prevent readers from having to scroll to the bottom of the article to see the references. The version with multiple <references/> tags is here. I couldn't find anything that specifically says only one references tag is allowed. Most of the other cases were really errors (written by editors who didn't understand how it worked), but the editor may have a point about this one. What do y'all think? --Auntof6 (talk) 06:53, 16 June 2009 (UTC)