Jump to content

Wikipedia talk:WikiProject Check Wikipedia: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
Docu (talk | contribs)
No edit summary
Line 267: Line 267:


Many instances of this error were introduced by certain bots. Take for example {{diff2|296237611}} (page created by bot), {{diff2|296134973|footer|}} (bot-generated link title). Odd that they would have this bug in common. — [[User talk:CharlotteWebb|CharlotteWebb]] 21:42, 13 June 2009 (UTC)
Many instances of this error were introduced by certain bots. Take for example {{diff2|296237611}} (page created by bot), {{diff2|296134973|footer|}} (bot-generated link title). Odd that they would have this bug in common. — [[User talk:CharlotteWebb|CharlotteWebb]] 21:42, 13 June 2009 (UTC)

:You might to drop a note to the operators of the two bots. Especially the first one is fairly recent. -- User:Docu


== Headline ALL CAPS ==
== Headline ALL CAPS ==

Revision as of 07:25, 15 June 2009

Check 69: ISBN syntax and MediaWiki:Booksources-summary

At MediaWiki:Booksources-summary, there is the summary displayed when users click on an ISBN number (1) or use Special:BookSources (2). Its previous text may account for some of the errors on Wikipedia:WikiProject_Check_Wikipedia#ISBN_wrong_syntax_(CodeFixer) or the other reports. Personally, I think it's less likely that people use (2) and we could probably word it even more for (1). -- User:Docu

Reference

Some articles dont use <references /> but {{Reference}} which is the same as {{reflist}}. Kwiki (talk) 06:59, 10 May 2009 (UTC)[reply]

It looks like this makes appear Cynthia L. Bauerly on the list for check #3 - Wikipedia:WikiProject Check Wikipedia#Reference tag .3Creferences .2F.3E missing (partial AWB) - where it shouldn't. There are several other redirects at Special:WhatLinksHere/Template:Reflist. -- User:Docu
I left a note for Stefan (de:User talk:Stefan_Kühn/Check_Wikipedia#Check_3_at_en.wp). They should be gone after the next update. -- User:Docu

Ignore articles tagged for deletion

Is this possible? Would it slow down the scan? More importantly, do we want it? Would it be beneficial to not "waste" time fixing articles that are later deleted, or is this an important service - letting people see the content of the article as it was intended to be displayed. Discuss. - Jarry1250 (t, c) 13:30, 10 May 2009 (UTC)[reply]

I guess it's probably possible. It's a good question. For some reports it was a bit annoying to get flooded with new articles likely to be deleted while there were many old ones that needed fixing. There are some I skipped, others I reformatted - to avoid seeing them for another week in the reports. -- User:Docu
What Docu said for the most part. When I'm using a tool like AWB or AutoEd I usually just fix the error, but if something is being done by hand it seems kind of pointless. –Drilnoth (T • C • L) 14:53, 10 May 2009 (UTC)[reply]
Not all the articles tagged will end up deleted, though, right? Even so, I guess it's reasonable to wait until they're untagged, just in case. --Auntof6 (talk) 03:39, 26 May 2009 (UTC)[reply]

New dump?

I take it the +179,000 bytes equates to the new dump having been scanned through? - Jarry1250 (t, c) 18:25, 15 May 2009 (UTC)[reply]

Looks like it:
http://download.wikimedia.org/enwiki/20090512/
2009-05-14 23:57:24 done Articles, templates, image descriptions, and primary meta-pages.
2009-05-14 23:57:23: enwiki 8521847 pages (99.613/sec), 8521847 revs (99.613/sec), 77.1% prefetched, ETA 2009-05-16 15:45:17 [max 22793793]
  • This contains current versions of article content, and is the archive most mirror sites will probably want.
  • pages-articles.xml.bz2 4.8 GB
I was wondering how much was still to come, but I'm still surprised. -- User:Docu
If we edit one article per minute, we will be done in 4 months .. -- User:Docu
Heh...
I have DrilBot on "headlines end with colon". –Drilnoth (T • C • L) 19:07, 15 May 2009 (UTC)[reply]
Starting from the back of the list since I see D6 got some of them already. –Drilnoth (T • C • L) 19:12, 15 May 2009 (UTC)[reply]

No, this is the old dump from 2009-03-13 01:27:21. I have start the scan of the old dump 3 days ago. -- sk (talk) 19:17, 15 May 2009 (UTC)[reply]

Oh wonderful. We have that to look forward to. - Jarry1250 (t, c) 19:20, 15 May 2009 (UTC)[reply]
Wow. The new scan will register changes since this one, right? –Drilnoth (T • C • L) 19:23, 15 May 2009 (UTC)[reply]
I'm guessing, but I think that the jump is because this is the first dump that a whole bunch of new / updated checks are being run on (as opposed to just edited). So probably not quite so many more. - Jarry1250 (t, c) 19:29, 15 May 2009 (UTC)[reply]
I think that's how it works. It's a great script regardless; thanks sk! –Drilnoth (T • C • L) 19:48, 15 May 2009 (UTC)[reply]
It would be just another two or three weeks of errors (dump is mid March, most checks were in place at the beginning of April). Luckily it's the weekend, traffic is low [1] and the server doesn't lag [2] (as of 02:49, 16 May 2009 (UTC)). -- User:Docu
Drilnoth, do you think we could talk Jarry into running another bot to crunch through the lists? -- User:Docu
I could manage an AWB bot, certainly. There's a python one going through the approvals process now, but I guess, DrilBot's in the best position for expandability. Sorry, I don't really feel like coding much more than merely setting AWB to stun mode. - Jarry1250 (t, c) 08:13, 16 May 2009 (UTC)[reply]
Given the amount of pages to process, I think it would be worth it. Ideally, we would try to process most pages before the next dump (in June probably). Given the way the script works, it's unlikely that we will get updated full lists before.
BTW the unicodify function on python (converts many) should probably be harmonized with the one in AWB (has a few exceptions). Ideally, the selection in the script sk is using should be similar (could shorter though). -- User:Docu
I hope to plug in some improved unicodification manually, although I agree that having it be default in AWB (maybe something like Wikipedia:AutoEd/unicodify.js's changes?) would be better. –Drilnoth (T • C • L) 10:52, 16 May 2009 (UTC)[reply]

Check 19 (Headlines start with one "=")

Usually there were just a few new articles listed. Now there are 2535. It's probably worth fixing these by bot, lowering all headers by one level. -- User:Docu

When I go through these it seems like quite often there will be a page where the headers are one level high for about half the article or something and then be accurate... fixing those by bot would create a page just as incorrect as the previous one. –Drilnoth (T • C • L) 10:51, 16 May 2009 (UTC)[reply]
Some are broken in odds ways others are just all one level off. I suppose one would have to check first if there is more than one header with level "=". The good thing is that some have already been fixed since March ;) -- User:Docu

Page update

18:52, 17 May 2009 (UTC) I'm trying to update the page, but it keeps timeing out. -- User:Docu

Seems like you got it. :) –Drilnoth (T • C • L) 20:06, 17 May 2009 (UTC) Oops, I'm blind. –Drilnoth (T • C • L) 20:08, 17 May 2009 (UTC)[reply]
Finally. I wonder if it's some new abuse filter that slows it down so much -- User:Docu

Title linked in text

Can ↑ be done reliably by bot? I'm asking because it was brought up at Wikipedia talk:AutoWikiBrowser/Bugs#Bold names and it seems that making this change on image maps could be problematic, as discussed here. Are there any other times that this could be a bad edit? –Drilnoth (T • C • L) 12:53, 18 May 2009 (UTC)[reply]

The one in the image map looks more or less what it should be doing (though one could add an exclusion for <imagemap>).
His is complaining that the self link on the image at 50000_Quaoar#Size is being removed. Personally, I think it even more important to remove these confusing ones, than fixing the usual ones. Anyways, in general, there is always a trade off between fixing 1000 and possibly breaking a few. -- User:Docu (15:32)
Hmm .. the missing link on http://en.wikipedia.org/w/index.php?title=50000_Quaoar&oldid=290470688#Size does make the image disappear completely. Too bad the extension has no maintenance category associated with it. -- User:Docu

It was fixed at 15:32. -- User:Docu

Excellent. –Drilnoth (T • C • L) 15:54, 18 May 2009 (UTC)[reply]

Double pipe in one link

I've been through a few of these with AWB. A lot of them are of the form [[article || text]] or [[article|text | ]]. This kind of error looks eminently like bot-work to me; is there one active, or could one be modified to suit? Mr Stephen (talk) 18:05, 18 May 2009 (UTC)[reply]

I might be able to have DrilBot fix the type that you mentioned, although a lot would need to be done by hand. I'm not entirely sure though; I'm still not good enough with RegExp. –Drilnoth (T • C • L) 20:20, 18 May 2009 (UTC)[reply]

44: Headlines with bold

Treating this as an "error" is brain-damaged. There are plenty of valid reasons why bold could appear in a headline, for example mathematical notation often relies on fixed typefaces. Automatic removal of bold tags as in this edit is dead wrong. — Emil J. 10:50, 19 May 2009 (UTC)[reply]

Hmm... well, it still would only matter in level 2 headlines since headlines of level 3 and below aren't visually affected by having the bold text. –Drilnoth (T • C • L) 13:46, 19 May 2009 (UTC)[reply]
Frequently, it looks a bit like bold text within text that is already bold, or italics and underscores combined.
In general, italics seem a viable option for additional emphasis within headline. In the sample above, <math></math> seems a good option as well. -- User:Docu

Numbers are low

I just signed up for this page and I will start going through them as well but I do have a couple comments. First I think some of the numbers are low. For example I know that there are more pages than listed here with incorrect breaks (i.e. <br>, <BR>, <br., etc. I also know that there are more pages with incorrect characters or invalid formatting in the defaultsort. Not criticizing because I am glad that someone created this list but wanted to let you know. My next comment is based on the rather minimal impact of some of these edits such as the breaks. I personally follow the belief that if you watch the penny's the dollars will mind themselves (Even the small edits are important over the long term) but some would argue that some of these edits are a waste of resources and fill up editors watchlists (also not a problem for me personally). Since AWB specifically requests that some of these edits such as the breaks not be done with AWB as standalone changes are we ok to go ahead and do them?--Kumioko (talk) 20:30, 19 May 2009 (UTC)[reply]

(de-capitalized header) I think that the list of breaks is about correct... things like <br> and <BR> are correct; the list here only has those which have an error like <.br>, <<br>, or <\br>. My bot ran through the defaultsort list a couple of days ago and fixed a lot of them, although it looks like the list might not have taken that into account yet... weird. Anyway, my feeling is that the things that don't really change much (e.g., the location of categories) and which can be done by bot should be done by bot... then it doesn't waste human resources and the edits don't show up on watchlists. –Drilnoth (T • C • L) 20:47, 19 May 2009 (UTC)[reply]
Kumioko, would you have samples of pages that were missed? The reports keep getting improved, but they are not meant to be exhaustive (at least for now ;) ). -- User:Docu
Drilnoth, when you have a moment, would you run your bot through 43/47 (broken templates)? It's easier to look manually at the remaining ones once these done. For an update to date list of what remains, we might have to wait for the next dump though. -- User:Docu 11:00, 20 May 2009 (UTC)
Sure; I'll start it running right after the next update. –Drilnoth (T • C • L) 12:38, 20 May 2009 (UTC)[reply]
Before the next dump (in June supposedly) would be sufficient (hopefully in June we wont get April data ;) ). -- User:Docu
For 43, thanks for doing a first pass by bot. I just finished today's 50. Luckily one page filled half the list ;) -- User:Docu
You're welcome; thanks for mentioning it. I don't think that DrilBot can really do much with #47... AWB can't pick up very many of them to fix automatically. –Drilnoth (T • C • L) 15:18, 22 May 2009 (UTC)[reply]
As I have to stop at each page to the check it manually, it's helps if the automatic ones are gone. Besides, too many of these at once, give me a headache. -- User:Docu
I read your previous note too quickly. You mentioned the other report. It does also work for #47 (missing opening brackets), see rev 4229 mentioned in /Archive#Missing opening or closing brackets, table and template markup. When editing with a new release, I have actually seen it being fixed! -- User:Docu
I know that it does work automatically some of the time... the problem is that there aren't enough that AWB can auto fix, and when I'd had my bot going through the list it was making a lot of edits that didn't fix that error. –Drilnoth (T • C • L) 14:11, 23 May 2009 (UTC)[reply]
If I'm not sure which type of problem AWB will fix, I'm just using "clean-up" as edit summary. It happens sometimes that I forget to switch it from a more specific one. For check#47, you could run it with a summary "Clean-up, general fixes (batch #47)" this might be sufficiently descriptive for the type of operation. If it's done just once for each dump, I think it's acceptable. One could also link general fixes to WP:GENFIXES, this way interested editors can easily find the full set of possible fixes. -- User:Docu 06:09, 24 May 2009 (UTC)
Eh... I'd do that except for the AN/I report about DrilBot, with the consensus being that more descriptive edit summaries are needed. –Drilnoth (T • C • L) 16:32, 24 May 2009 (UTC)[reply]
It's not a coincidence that I wrote the above. Anyways, WP:GENFIXES is very descriptive (IMHO), maybe you could even copy it to a separate page. The problem is that if the edit summary is too descriptive and the edit doesn't make the change that is in the summary, it's more problematic. I suppose it would be possible to set AWB to have changes that trigger edits (with a specific summary) and add all other gen fixes behind. Whatever solution you choose, after each 20k of edits, you will get a new thread on ANI ;) -- User:Docu
35k edits. :) I'm working on User:DrilBot/Summaries to create a more descriptive guide on these edits. –Drilnoth (T • C • L) 17:00, 24 May 2009 (UTC)[reply]

Non-editable and unreliable (check 7)

Heed this edit! I will be back to do more of these later. Can someone supply the secret contact information mentioned in the edit summary? Michael Hardy (talk) 11:39, 24 May 2009 (UTC)[reply]

The list on the server isn't completely up to date, from the introduction "# The number of items on this page is limited. For a longer list see tools:~sk/checkwiki/enwiki/. These aren't updated daily though. When working on the toolserver lists, it is suggested to start from A. The next day, the script will use these items to re-generate this page omitting already fixed articles.". An estimated one third is already fixed. I'm not quite sure when he will be doing it, but the list will be split into two separate ones (the ones without level "==" headings and the ones with, according to (de:User talk:Stefan Kühn/Check Wikipedia#Check 7 at en.wp)). -- User:Docu

I wasn't worried about missing items from the list, but about items on the list that shouldn't be.

Why is Bell polynomials on the list? Someone edited it to change some subsections to first-tier sections. I hit the "rollback" button. They were intended to be subsections. Michael Hardy (talk) 12:11, 24 May 2009 (UTC)[reply]
In fact, I fixed the error on the page more than a month ago myself diff. Items on the toolserver lists date from the scan done 10 days ago of the March 2009 dump. Items are rescanned every day to generate a list of 50 current items. That means the items here are still not done as of yesterday. Obviously this system works better for lists where we don't have that many open. -- User:Docu
Well, someone "fixed" it again today and I reverted to the "unfixed" version and I'll have to do that as many times as someone does that same "fix".
So is it strictly forbidden to have a subsection in the initial section that has no main header? Michael Hardy (talk) 17:53, 24 May 2009 (UTC)[reply]
Per WP:MOSHEAD, headlines should start with "==" with subsections being "===", "====", and so on... so consensus is against having a subsection in the lead of the articles. –Drilnoth (T • C • L) 18:21, 24 May 2009 (UTC)[reply]
Michael Hardy, I saw the re-"fixing", someone was a bit in a hurry, I'm glad you repaired it.
The article title is a level <h1> heading, thus the next lower level and first level in article text, would be a <h2> level ("=="). The peculiar thing about Wikipedia is that the lead section doesn't have a header which is somewhat asymmetric. Interestingly even on the Main page, they managed starting with h1 and going to h2! -- User:Docu 18:25, 24 May 2009 (UTC)

This check is also broken for special pages like disambiguation pages, where smaller headers are appropriate. This check needs to be eliminated or seriously refined. —Centrxtalk • 18:12, 24 May 2009 (UTC)[reply]

Why should it be different for dabs? –Drilnoth (T • C • L) 18:21, 24 May 2009 (UTC)[reply]
Disambiguation pages are often short, and there may only be a couple of entries in each section. The ==-level header, which also adds an underline, is far too prominent for disambiguation pages. The section header should not be the same size, or half the size, as the entire section. The standard promulgated by this Check may be theoretically best for articles, but not for many different types of other pages. —Centrxtalk • 18:28, 24 May 2009 (UTC)[reply]
Also, the script or bot or logic that was automatically promulgating this Check, is additionally broken in at least three ways. —Centrxtalk • 18:33, 24 May 2009 (UTC)[reply]
Which are they? -- User:Docu

Start copy from User talk:Centrx.

Inspection reveals two major classes of page where this Check was automatically implemented: a) disambiguation pages, where a lesser section header was specifically intended; b) blatantly non-wikified pages, that need far more help than tweaking section headers.
Also, without exception, the bot did not even implement the Check correctly. It does not normalize section headers, it simply chops off one =. For example, ==== is reduced to === even if it is supposed to be == at the top level.
This analysis does not even enter the situation of general bugginess, as evidenced by the fact that it eliminated correct sub-sections in [3]; and other problems and objections on User talk:PigFlu Oink. —Centrxtalk • 19:11, 24 May 2009 (UTC)[reply]
Bell polynomials is clearly wrong as I had to do it manually myself. It would be interesting to know what caused it. a) isn't incorrect as MoS does warrant level "==" headers. I don't see a problem with b) as such articles never get fixed in one step. "====" to "===" happens with AWB too. -- User:Docu
Looks like it chopped off a level from the first level "===" and below headers it found. Not good. -- User:Docu
  • General MoS does not apply to special cases in special pages. Even if the disambiguation page headers are incorrect, the proper correction is to change them to mere Bolded title as used in Wikipedia:Manual of Style (disambiguation pages), not to change them to grand section headers.
  • Depending on the page, tweaking a broken page either means 1) actual errors are obscured by making the page look superficially correct, but still be wrong header-wise (e.g. [4]); or 2) little is lost by incidentally reverting a page that needs to be majorly fixed by the first editor to come along anyway.
  • Unless someone will manually inspect 15000+ edits to save a handful of sound corrections, reverting all is the only way to correct the mistakes caused by an unauthorized, blatantly buggy bot. —Centrxtalk • 19:38, 24 May 2009 (UTC)[reply]

End copy from User talk:Centrx.

TOC check

There is a list of longer articles where it could be worth checking the structure, it's at Wikipedia:WikiProject Check Wikipedia/AWB. -- User:Docu

See Also => See also

For consistency, "== See Also ==" should be capitalised "== See also ==" Tabletop (talk) 02:42, 26 May 2009 (UTC)[reply]

Category duplication (check 17) due to template that includes a category

A lot of the asteroid articles have duplicate categories because they're stubs, and the stub template includes a category that's also hard-coded. To me, it's not obvious that the stub templates include the categories (I had to research it a little). Therefore, I'm inclined not to resolve these so that future removal of the stub templates won't remove the category altogether. What say ye? --Auntof6 (talk) 05:13, 26 May 2009 (UTC)[reply]

The reports don't cover this, as far as I know. The category needs to defined twice in the article itself, e.g. Category:Main Belt asteroids in 10515 Old Joe [5]. -- User:Docu
OK. Maybe the ones I spot-checked already had the duplicate removed, then. I just went through all the ones that start with numbers and did find some with the category specified twice (and fixed them, of course!). I'll wait and see if they're still there if/when the list gets regenerated. --Auntof6 (talk) 08:27, 26 May 2009 (UTC)[reply]

Suggestion: stub templates on articles that are not stubs

AWB seems to remove stub templates from articles that are over a certain size (I don't know what size that is, though). Maybe this project could generate a list of non-stub articles that have stub templates. --Auntof6 (talk) 09:32, 26 May 2009 (UTC)[reply]

Database_reports has a weekly Long_stubs report. -- User:Docu

Error 2 incorrect entry

[edit] Article with false
(AutoEd)
...deleted...
article info
Comparison of layout engines (Non-standard HTML) * <tt id="trident_wbr"><wbr>Naraht (talk) 15:13, 28 May 2009 (UTC)[reply]

I cleaned that up so that it shouldn't be listed in the future. It should have been using &let; and &gt; tags originally, anyway. Thanks! –Drilnoth (T • C • L) 15:12, 28 May 2009 (UTC)[reply]

May 28 update

It seems like the toolserver problems canceled the update. -- User:Docu 00:15, 29 May 2009 (UTC)

Appears so. Ah, well; maybe tomorrow. –Drilnoth (T • C • L) 01:54, 29 May 2009 (UTC)[reply]
Apparently, after 9 hours, it did finally get through. BTW we are down
from "342,337 ideas for improvement in 304,586 articles" (May 15)
__to "257,460 ideas for improvement in 233,446 articles" (May 28).
Even if many of the articles of the March dump scanned on May 15 were already fixed by May, I think we made substantial progress. -- User:Docu
Woah; that's a nice statistic. Although wasn't the "headlines start with the '='" check split into two? I'm guessing that that reset those counters until the next database dump. –Drilnoth (T • C • L) 13:12, 29 May 2009 (UTC)[reply]
I just hope the new dump wont be as old as the last one. BTW 7 and 83 combined should still equal the old 7. The current distribution between the two isn't "accurate" yet, as the server lists are checked to see which pages still fail the new check 7 or go under 83. Anyways, I think it's quite encouraging. -- User:Docu

Automated edits to heading hierarchies

I've opened an RfC with regards to whether the community stand by the MoS, but mainly about whether it should be enforced using automated edits. Cheers, - Jarry1250 (t, c) 14:53, 30 May 2009 (UTC)[reply]

Grand!

Another huge database dump! :) I'm guessing that this is a more recent one? –Drilnoth (T • C • L) 19:36, 1 June 2009 (UTC)[reply]

Or... did it just rescan the same dump with the new errors? DrilBot is finding a lot of already-fixed articles with #53. –Drilnoth (T • C • L) 20:45, 1 June 2009 (UTC)[reply]
WTF? Locos ~ epraix Beaste~praix 21:00, 1 June 2009 (UTC)[reply]
#16 was mostly ok. It might be the dump from 2009-05-24 21:39:17. -- User:Docu
That sounds like it could be accurate... I think that DrilBot ran through this specific list after that. –Drilnoth (T • C • L) 21:55, 1 June 2009 (UTC)[reply]
Soon will be up2date, nothing left to do! -- User:Docu
But that's a good thing, isn't it? :) –Drilnoth (T • C • L) 22:15, 1 June 2009 (UTC)[reply]
I'm seeing something similar with #17. You sure it didn't scan an old dump? I've been working through that list, and the areas I've already worked on (numbers through J) suddently have a lot of articles again. --Auntof6 (talk) 22:14, 1 June 2009 (UTC)[reply]
It was the new dump. I have change my script after the new server make every 4-5 days a new dump of a language (de, fr, ...). This was too fast for my script. Now a script start at 1., 8., 15. and 22. of a month and search for new dumps and scan this. So maybe all 7 days we use a new dump. The new en-dump is from 2009-05-24 21:39:17 and my script found at 2009-06-01 the new dump and scan this. -- sk (talk) 07:47, 3 June 2009 (UTC)[reply]
Okay; thank you for clarifying. –Drilnoth (T • C • L) 12:30, 3 June 2009 (UTC)[reply]

More contributors

Today I have the idea to recruit more contributors. Under Wikipedia:WikiProject_Wiki_Syntax#Thank_you_to_contributors we found many Wikipedians, who help in this old project. Maybe they don't know the new "WikiProject Check Wikipedia". Some of you can inform this people at the User discussion page. What do you think about this? -- sk (talk) 08:05, 3 June 2009 (UTC)[reply]

Hmm... neat. I might be able to use AWB to send all of them messages about this. –Drilnoth (T • C • L) 12:32, 3 June 2009 (UTC)[reply]

# 81 Reference duplication

I went through this category and I did most of this one but there seemed to be a lot of false positives so the number still isn't at zero. You might want to rerun the list using your script and see what's left.--Kumioko (talk) 18:43, 3 June 2009 (UTC)[reply]

If you found a false positiv, then please write the article title here. So I can check this. -- sk (talk) 10:13, 4 June 2009 (UTC)[reply]
Ok I will do that next time I go through.--Kumioko (talk) 12:42, 4 June 2009 (UTC)[reply]
I couldn't find one for 81 but I just found one for 19. European grid is showing on #19 as having only 1 = for a section but the only rogue = I could find was related to a math calculation.--Kumioko (talk) 13:13, 4 June 2009 (UTC)[reply]
See this change. If my script found a "=" at the first position of a new line, then it think it is a headline. Normaly a "=" should not be at the first position. -- sk (talk) 20:26, 4 June 2009 (UTC)[reply]

Add details to the different error sections

As this project grows and we add more and more users and errors I think it would be good if in each of the error sections we give a link (where possible or practical) to the reference in WP that identifies the format or error. I know what some of them are but before I start making a large change like that I wanted to mention it here first.--Kumioko (talk) 18:54, 3 June 2009 (UTC)[reply]

Agreed; I'll try to do this when I have the time. –Drilnoth (T • C • L) 15:39, 4 June 2009 (UTC)[reply]

Add logic to AWB to include more of these errors.

I have been looking at the errors and have noticed that there are several that I think could or should be added to AWB as general fixes or at least as a custom module. Before I go adding a bunch of feature requests though does anyone have any thoughts about which ones they feel could or should be added to AWB?--Kumioko (talk) 18:57, 3 June 2009 (UTC)[reply]

I'd say, just go ahead and request them. Note that "reference duplication" has already been requested, as has "template with Unicode control characters". –Drilnoth (T • C • L) 15:40, 4 June 2009 (UTC)[reply]
OK and I think they added a few others along with the change from using a text box to a richtextbox (to make and display the edits). Maybe I'll wait till the next version comes out before I make the suggestions.--Kumioko (talk) 17:42, 4 June 2009 (UTC)[reply]
You could just download the most recent SVN build; that's what I do. –Drilnoth (T • C • L) 17:44, 4 June 2009 (UTC)[reply]
Handy link (that's unless you want to hook into the SVN directly). - Jarry1250 (t, c) 17:48, 4 June 2009 (UTC)[reply]

Reactivated errors

I have reactivated the following errors:

  • #30: Image without description and #35: Image gallery without description. These are both errors because they cause accessibility issues, e.g., for people using screen readers. I can't really see any reason why they should be deactivated, as the vast majority should have descriptions per W3C standards.
  • #36: Redirect not correct. Many of the redirects listed here work, but they use improper syntax (e.g., a carriage return between #REDIRECT and the target page's name). I believe that AWB can be configured to fix these, so a bot can repair them easily and those won't require human attention. The redirects on the list that are malfunctioning can then be fixed manually with ease.
  • #79: External link without description. "Bare links" should almost never be used... they should have some description to help indicate where they lead to. I'm not sure why this was turned off.

Reactivating these may increase the amount of time that the scan takes, but I don't think that that really matters... it will still be daily, just at a different time of day. If you think that any of these should be deactivated again, please don't hesitate to post here. –Drilnoth (T • C • L) 16:39, 6 June 2009 (UTC)[reply]

For 2, I agree with you completely. As for 1 and 3, the question here is about whether they are worth listing for now. They invariably involve large numbers of articles which need to be fixed by hand. Not all images need descriptions by the way. We're never going to get through them all, so why both listing them? I mean, that a pessimistic view I know, but seriously, 30,000 images? That's a hell of a job. - Jarry1250 (t, c) 16:57, 6 June 2009 (UTC)[reply]
I'm kind of torn myself on point 1... the way I see it, saying "we're never going to finish them, so why bother?" doesn't really makes sense... the question is more like "How many of these don't need descriptions? And is this a good job to have on CHECKWIKI, which primarily lists cosmetic and code changes, not content problems?" Ditto with the external links. I thought that they should maybe be reactivated and we can see what they come up with before making a final decision. I guess that I'm not really opposed to having them deactivated—just kind of neutral on it—for the reasons that I outlined (is it a good job for CHECKWIKI?). Feel free to re-deactivate them if you'd like; I won't oppose it. –Drilnoth (T • C • L) 17:02, 6 June 2009 (UTC)[reply]
One idea for #30 and #35. Many of this images without description are flags. I think it is no problem to create a bot who check this images and find flags like and create a description like Flag of Germany. In dewiki we have 40000 articles with minimum one image without description. 5000 of this 40000 are flags. I have try to get a bot, but at the moment nobody had time. Maybe in Enwiki someone can create this bot. If he use all flags of commons in this commons:Category:SVG_flags this will help. If the #30 activated then he can also use this file. -- sk (talk) 19:46, 6 June 2009 (UTC)[reply]

Broken character entity references

Any chance you could run a script to find things like [6], [7]. I've been finding a lot of these lately where the semi-colon is missing. Obviously this would be listed as a higher-priority error. — CharlotteWebb 21:07, 6 June 2009 (UTC)[reply]

I'll mention it to sk. –Drilnoth (T • C • L) 23:10, 6 June 2009 (UTC)[reply]

Many instances of this error were introduced by certain bots. Take for example [8] (page created by bot), [9] (bot-generated link title). Odd that they would have this bug in common. — CharlotteWebb 21:42, 13 June 2009 (UTC)[reply]

You might to drop a note to the operators of the two bots. Especially the first one is fairly recent. -- User:Docu

Headline ALL CAPS

Is it possible to have an "exceptions" list for this test? About 42 of the top 50 items are valid capitalised items, mainly either abbreviations, keywords or titles. welsh (talk) 19:21, 7 June 2009 (UTC)[reply]

I will create in the next future a Whitelist for all errors. So we can solve this problem. -- sk (talk) 08:56, 8 June 2009 (UTC)[reply]

Sharing regexes

I have been building a new AWB plugin-system thing, designed at making it possible to subscribe to blocks of regexes and to collaboratively work to improve them. I'm calling it FRONDS at the moment (a working title) and you can read all about it at Wikipedia:AutoWikiBrowser/Fronds. I'm designing it (in the broader sense of the term) with CheckWiki in mind. See what you think: expand that page, or put questions on the associated talk page. Cheers, - Jarry1250 (t, c) 19:23, 9 June 2009 (UTC)[reply]

(I edited the above to avoid WP:TLDR.) A beta's almost ready now, and it'd be nice to get some regular expressions in the system. I'll be adding mine today. - Jarry1250 (t, c) 10:07, 12 June 2009 (UTC)[reply]

No new updates for a few days

See [10]. I hope that that can be fixed easily enough. Anyway, there's quite a lot to work through here already. –Drilnoth (T • C • L) 15:56, 12 June 2009 (UTC)[reply]