Wikipedia talk:AutoWikiBrowser/Archive 1

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Archive 1 Archive 2

Contents

I'm interested

I would love to try out the new AutoWikiBrowser and provide feedback on it. Send an email to floresg2 AT gmail.com. Thank you. Gflores Talk 22:20, 8 December 2005 (UTC)

Me Too

I would like to try this out also it looks a lot friendlier than the pywiki framework --Kaiserb 00:23, 9 December 2005 (UTC)

Downloadable

Anyone interested can download it here, bear in mind that it is a development version! Martin 09:43, 9 December 2005 (UTC)

Is that an installer, or a self-contained executable? —Phil | Talk 12:52, 9 December 2005 (UTC)
Self-contained executable. Gflores Talk 16:43, 9 December 2005 (UTC)

weird edits?

You auto-thingy wasn't just doing a recat in this edit [1]. You can see that it also added a space after the initial asterisk in lists (which is unnecessary), and also doing some other changes that I couldn't figure out what they were (removing spaces?). BlankVerse 22:54, 10 December 2005 (UTC)

I removed some white space, and added some spaces to the bullet points to make the wiki text more readable at the same time as re-cating, whats weird about that? Martin 23:05, 10 December 2005 (UTC)
Since I'd never seen your edits before yesterday, I decided to look at one today. Instead of the expected re-cat, the dif shows a whole bunch of changes plus the re-cat. That's why it looked weird to me. BlankVerse 23:35, 10 December 2005 (UTC)
Yeah, among other improvements I am making more descriptive edit summaries easier ;) Also I think I won't bother with the very very minor edits from now on anyway. Martin 23:48, 10 December 2005 (UTC)

Works well

The bot works quite well. The only issues I have had is when a blank page loads i.e. image with no tag. The bot carries the information from the previous edit and trys to fill in the blank page. —  KaiserB 23:31, 11 December 2005 (UTC)

Thanks, I hadn't noticed that, I'll try and get it fixed soon. Martin 23:34, 11 December 2005 (UTC)

Can you please add me (User:KaiserbBot) to the official users list for this bot software. Thank you, —  KaiserB 16:40, 13 December 2005 (UTC)

Can't see the bottom of the window

I suspect that this is a problem I have encountered before, whereby I can't see the stuff right at the bottom of the AWB window; there's no scroll bar so I simply can't get at it.

My windows toolbar is double-height, to allow more stuff to be displayed in the "clock" area, and to allow more window buttons. You can see a sliver of the top of my toolbar right at the bottom of the screenshot. I suspect that AWB positions its buttons either relative to the top of the window, or based on incorrect information as to where the bottom of the window actually lies.

It's really frustrating, since this looks like an excellent tool…I'd like to see and use all of it Smile eye.png

In the meantime, I've a couple of questions:

  • There are no access keys: "Make list" could be "Make list" maybe…
  • Tools→Options does nothing for me: should it?
  • There's a new tab: what does "messaging" do?
  • What are the "general fixes"? Can we pick and choose?

HTH HAND —Phil | Talk 11:16, 14 December 2005 (UTC)

OK, in order...

  1. It can be resized by "grabbing" the middle horizontal border between the buttons area and the browser, hopefully I'll fix it properly soon.
  2. Yup, I still need to add shortcut keys and stuff (admittadly I had forgotten, as I never use them).
  3. The "Options" should not do anything, I was fiddling and forgot to remove it ;)
  4. Messaging is for appending a message to the bottom of a page, i.e. for spamming user talk pages (someone asked me for this feature). I have tried to make the tooltip text (you know, the popup help text that you get when you hold the pointer over something) as explanatory as possible.
  5. General fixes are mainly errors in the "See also" and "External links" sections (e.g. mis-capitalisation, or a non-standard name), removing excess white space, I need to add more to these general fixes.

Let me know if there are other features that I coud add, I have some that I haven't enabled yet as well.

thanks - Martin 11:58, 14 December 2005 (UTC)

p.s. You'll need the newest version for it to work. Martin 12:23, 14 December 2005 (UTC)
  • You beauty, now I can see it in all its glory! Cheers —Phil | Talk 14:14, 14 December 2005 (UTC)

Search limitations?

I came across a user's signature which had been broken by the brief outage of HTML Tidy a while back, and I thought that it would provide an ideal exercise for me to get to know AWB. However, AWB doesn't seem to be able to find the broken text, even here when I can see it right there in the textbox! The text in question is:

[[User_talk:Admrboltz|<sup>=/\=</sup>]]

and I was attempting to replace it with:

[[User_talk:Admrboltz|<sup>=/\=</sup>]]

(note that in the wikitext I have had to double-escape the first version: the second version is actually the broken one which is rendering just fine here :-). Are there some settings I should be tweaking? Should I be double-escaping the HTML entities like I had to just above? —Phil | Talk 14:30, 14 December 2005 (UTC)

hmmm it was because it has an \ in it, which the AWB interpreted as an escape sequence I think, so I escaped the escape sequence by sticking another \ into the "find" and it was OK. I'll work out how to fix it, hopefully it won't be a problem in the mean time. Martin 14:59, 14 December 2005 (UTC)
Actually, on second thought it's pretty good needing to escape some characters; it means the search can be much more advanced. Martin 16:01, 14 December 2005 (UTC)

More escaping

I just discovered you have to take great care sometimes… I was searching for:

[[Box of Delights]]

which I had just turned into a REDIRECT, so I could short-circuit it. I fetched the list of "links to" articles and told AWB to replace that with:

[[The Box of Delights]]

which is the proper article. Imagine my horrified interest when the first article appeared with every single instance of "?]]" replaced with [[The Box of Delights]]: it was obviously treating the [ ] like a character range. But I didn't have "Is regex" selected: should it really be doing this? If that option is switched off, shouldn't it treat the string as completely raw? It works fine, BTW, if you use

\[\[Box of Delights]]

(which obviously won't work now 'cos I've been and gone and done it :-) HTH HAND —Phil | Talk 16:25, 14 December 2005 (UTC)

Yeah I found that as well, I didnt realise that it would treat it like a range, but I guess I could call it a new feature! Martin 16:37, 14 December 2005 (UTC)


Whoops, I just realised that what i had done initially was correct (I was starting to get worried), but I was passing the wrong parameter to my find and replace method, and as such regex was permanently on (as you suggested). Its fixed now (simple find and repace really is simple now!) Martin 16:56, 14 December 2005 (UTC)

Question

For the option, 'Ignore if contains:', is that only for the title of the page or for anything in the article? Also, is there a way to add several entries (like User: Wikipedia: talk:), basically I just want the main article namespace.

Secondly, I think this tool would be great for disambiguating links, but I really don't know how that would function in the program. Perhaps using the what links here option, and somehow getting a list of links from the term's disambiguation page (or the person could just create this in notepad), then the person could enter some link to disambiguate in the search field and AWB will search for it in the small textarea and perhaps highlight it. Upon which, the user will change the link to point to its appropriate location. I don't know that may be difficult to implement. If I'm not being clear, let me know. I'm way too tired right now. BTW, can't wait for the local text file support. Thanks. Gflores Talk 04:37, 15 December 2005 (UTC)

It ignores article text, to seperate main space; sort them alphabetically then remove them from the list. Martin 08:58, 15 December 2005 (UTC)
I will add regex to the "ignore if contains" as well so you can search for as much or as little as you like. I am also working on a filter to leave only main space articles, but it is easy to do it manually so its not a big priority. Martin 09:02, 15 December 2005 (UTC)

Previewing manual changes

Is it possible to preview the effect of manual changes? I'm not that confident in my typing to be certain I always get it right first time… HTH HAND —Phil | Talk 08:55, 15 December 2005 (UTC)

I initially had a preview, but it caused problems, I'll work on it. Martin 09:02, 15 December 2005 (UTC)

Slow auto-touch

So I haven't a bot account, but I would like to have AWB sit in the background, touching a whole list of articles, fixing "links to" issues for updated templates. Can I set it to go off at 1 minute intervals, just wandering down the list of articles? What's the best way to make sure it just touches, doing nothing substantive at all? HTH HAND —Phil | Talk 09:56, 15 December 2005 (UTC)

turn all the options off. Martin 15:24, 15 December 2005 (UTC)

Regex

I would like to delink solitary years e.g. [[2005]]. But I need to avoid years if they are not solitary. I have been searching for 'in [[2005]]' and ' of [[2005]]' but that is laborious. Presumably I can make use of regex to avoid ']] [[2005]]' and ']], [[2005]]' etc.

I know that you are not a regex helpdesk but could you let me know how regex find square brackets and how it does ignore? Bobblewik 10:05, 15 December 2005 (UTC)

Add to Watch-list

I would like the option to be able to inhibit adding articles to my watchlist. If it were possible to preserve the current situation (i.e. only "add" them if I am already watching), that would be great. In the meantime, I'm hoping that, despite the abjurations not to click in the browser pane, if I untick "add to watchlist" it will be respected Smile eye.png HTH HAND Phil | Talk 10:12, 15 December 2005 (UTC)

Just clicking that wont cause any problems. Martin 15:30, 15 December 2005 (UTC)
I'd like AWB to take care of it, so that I could just set it off ticking away in the background: maybe it could be an option in the menu? Of course, this issue of side-effects in other applications might need some attentnion before I can do that :-( HTH HAND Phil | Talk 12:57, 19 December 2005 (UTC)
Interacting with the webpage is difficult, no one I asked had any idea how to do it, getting text from the edit form is not to bad, "clicking" the save and diff buttons is a bit of a hack, but I have yet to master how to check and uncheck the tickboxes. Martin 13:06, 19 December 2005 (UTC)
It's a while since I managed any serious VBA programming, and I stopped doing it for money shortly before VB.NET arrived, but I might be able to help (if only to make some really obvious observations and make sure every base has been covered :-). Is the source code anywhere I could take a peek? How do you get the text out of the edit form? Could that be modified to access the other "controls" on the web-page? —Phil | Talk 15:11, 19 December 2005 (UTC)
Getting text from a form is quite easy when you know how; something like webBrowser.Document.GetElementById("wpTextbox1").InnerText. To be honest I havent looked into this issue that much, I had other priorities (such as getting the "preview" button etc. ) now that's done, I'll give it some more thought. I doubt anyone other than myself could understand the source as it needs to be tidied up a lot and noted properly ;( If there is anything specific you want to know, just ask though! I know that when I find the answer it will be simple, such as the code is above. thanks Martin 15:29, 19 December 2005 (UTC)
OK, ive worked it out now ;) Martin 16:30, 19 December 2005 (UTC)
I assume that it would be something along the lines of (very roughly) webBrowser.Document.GetElementById("wpWatchthis").Checked=False. How wrong am I? Phil | Talk 16:37, 19 December 2005 (UTC)
Along the right lines, c# sees it as an element with properties, so you have to set the properties like webBrowser.Document.GetElementById("wpMinoredit").SetAttribute(string attributeName, string value); Martin 16:47, 19 December 2005 (UTC)

Inhibit adding

I note that there is now an option to "add to watchlist". Actually what I need is the opposite: I already have this option set in my Preferences. What I want is the option that the pages I deal with through AWB not show up in my watchlist unless I specifically say so. Sorry to be a pest, but would this be difficult now you've figured out how to control that check-box? —Phil | Talk 10:48, 20 December 2005 (UTC)

  • I think for the moment the best way might be to temporarily change one's prefernces, to remove the automatic "add to watchlist" while Using AWB. Then it will presumably not do so. If edsiting manually in a different window at the same time you would need to remember to set this as i would not be auto-set. then after closing AWB re-set the prefernce. DES (talk) 21:10, 21 December 2005 (UTC)

Startling drop-out when attempting to cut/paste

Yikes!

I think I just discovered that the editing box doesn't use the standard "Ctrl-X = cut" shortcut: I tried "cutting" out some text to move it down a line and AWB dropped out. What was really disturbing is the fade effect you seem to have applied: for a significant number of seconds it gave the impression that this machine had blown several fuses. Maybe you could put something into those "Options" you left blank, and inhibiting the fade-out could be the first… So what is permissible for editing in that box? —Phil | Talk 11:36, 15 December 2005 (UTC)

I'll change quit to cntr-q instead, like I said I never use the keyboard shortcuts so I didnt know it was cut as well. thanks Martin 15:53, 15 December 2005 (UTC)

Urgent correction need to bug in auto-fix

The auto-fix of section headings has a bug. See what it did in the following edit [2]. Bobblewik 14:01, 15 December 2005 (UTC)

That's odd anyway: I wasn't aware that you didn't need an actual newline after a section heading. Looks like AWB is cleaning out the "extraneous" white-space after the "="s but failing to replace it with a newline. Maybe it should always stick a newline in?—Phil | Talk 14:49, 15 December 2005 (UTC)
Erm, I am not responsible for cleaning up after you, go and fix it. The way that page was done is dumb anyway, there is no point in me fixing that problem, as it is extremely rare, and you are actually supposed to check what your saving. Martin 15:21, 15 December 2005 (UTC)
OK. I did not know that it was extremely rare. Now I see that the article originally had a single space character instead of a new line. It seems that Wikipedia accepts either a single space character or a newline as an 'end of section heading'. So a possible solution would be to add a single space character. That would be unnecessary in almost all cases. If, as you say, the problem is rarely encountered, then that is ok by me.
Another issue of page blanking cropped up. Any thoughts on what happened? See the diff: [3]. Bobblewik 18:28, 15 December 2005 (UTC)
No never seen that before. You really should check your changes. Martin 18:57, 15 December 2005 (UTC)
It happened again. See [4]. Are you still inviting feedback on such matters? I know that it is caveat emptor. Regards Bobblewik 00:06, 16 December 2005 (UTC)
I imagine its probably more to do with the wiki server side of things than the program, not that i couldnt do something to avoid it, unfortunately it is impossible to replicate the problem, I'll get around to making it re-load or skip the page if it thinks its blank. But you really should check what you are saving! Martin 00:35, 16 December 2005 (UTC)
Hence my request for a preview option. This is especially important when manual changes are added in the edit box. It would also be helpful to be able to see if someone has been messing with the templates you might be using (grrr!) whilst you're in the middle of using them (GRRRRR!) —Phil | Talk 09:46, 16 December 2005 (UTC)
Bobblewik, if it's happening only to you, maybe you should stop using AutoWikiBrowser until you sort out the problem. :-) -- Mpt 13:57, 16 December 2005 (UTC)

Ok, Ive added security to both potential sides of the problem, available in next release. thanks Martin 01:15, 16 December 2005 (UTC)

Racing to the download link right now Smile eye.png Phil | Talk 09:46, 16 December 2005 (UTC)

Affecting other applications

AWB seems to be affecting other applications when it's working. If I send it off to fetch a page for processing, and switch to another application during the pause, I seem to be getting an "enter" keystroke being spontaneously generated which is making my other application do stuff. Is this a side-effect of how AWB works? If so, can it be stopped? Please? —Phil | Talk 09:54, 16 December 2005 (UTC)

yeah I need to work out a better way of calling the save and diff buttons. Martin 15:14, 16 December 2005 (UTC)

Don't add duplicate entries to list

If you add more entries to the list of articles, AWB should check whether an entry is already there and not add it twice. (As a side-comment, I've had some interesting phenomena when removing items from the list, but I'm unable to reproduce them: removed items staying put, selections becoming multiple, random stuff. I'll let you know if it recurs.) HTH HAND —Phil | Talk 13:59, 16 December 2005 (UTC)

I'll get around to it. Martin 15:15, 16 December 2005 (UTC)
Done! Martin 22:37, 18 December 2005 (UTC)

Bug?

WikiBrowser bug.jpg

Whenever I use this program I get a wierd bug. See the screencap. Broken S 19:27, 17 December 2005 (UTC) By the way this program is really sweet (I am still using it even though I have to click through the error messages [3 errors per page fixed]). Broken S 19:30, 17 December 2005 (UTC)

That is a problem with Internet Explorer (hence the error "Internet explorer script error" ;) ), there is something I can do to avoid it though (I think). thanks Martin 20:03, 17 December 2005 (UTC)
OK, I have disabled script errors now (v0.85), I imagine that will keep it happy, (thanks btw!) Martin 20:07, 17 December 2005 (UTC)
Yep that fixed it, thanks for the quick response. Now it's even better. Broken S 20:10, 17 December 2005 (UTC)
Actualy, it's still doing it (now it's not doing it when I generate lists)...oh well. Broken S 20:14, 17 December 2005 (UTC)
I figured it out, I understand, never mind me. Broken S 20:16, 17 December 2005 (UTC)

Redirects

While doing a touch run (doing null edits) to update the list of used templates in articles I noticed that AWB on redirects opens the redirect page and not the target page of the redirect (Example Plzen). For this specific kind of run I would have needed that AWB opens the target page, not the redirect page. I made the list from the "What links here" of template:if. – Adrian | Talk 13:53, 18 December 2005 (UTC)

Done! Martin 22:37, 18 December 2005 (UTC)

New features?

Ok, what features can I add to make your lives easier? Are there any specific jobs that the program could be adapted to better suit? Martin 00:36, 19 December 2005 (UTC)

Also, are there any more general fixes that I could add? Martin 00:40, 19 December 2005 (UTC)

how about a way to tag images...I'm not exactly sure how I would work it. Some should have {{unverified|~~~~~}} other might need {{unknown}}. And others you might be able to decide fairuse or maybe they forgot to tag it gfdl (but wrote it down in plaing text). Maybe a drop down box? Broken S 00:42, 19 December 2005 (UTC)
I would love a way to import a list from a text file. Has this feature been scrapped? This would be very helpful for fixing typo, since most people in the WP:Typos use the more effective google search to find pages with typos. Gflores Talk 00:59, 19 December 2005 (UTC)
I am working on it. Martin 09:31, 19 December 2005 (UTC)
also I'm still getting the thumbs error from above in the new version (.90). A redirect fixer would be nice. You tell it to fix links pointing at redirect. You can sort of aproximate that now, but it can be confused by piped links (if "link" is moved to "linkname" and I use your program to fix the redirects I get [[link|linkname]]. being changed to [[linkname|linkname]]. Broken S 01:32, 19 December 2005 (UTC)
What is the "thumbs error"? If you mean the script error, then I am 99% sure that is IE's fault, make sure you have the newest version. An automatic link fixer is a bit difficult, but I'll think about it! Martin 09:31, 19 December 2005 (UTC)
How about being able to plug into a webservice or something like that which then corrects the fixes (and also stores the list)? Indeed, then it would be User:Humanbot!
Seriously though, the User:Humanbot script needs a major rewrite and if your program could be a better interface, perhaps integrating them would be the way to go. :) r3m0t talk 10:58, 19 December 2005 (UTC)
(sorry for delayed response) I think it would be difficult to intergrate them, my program would probably work more easily from a list of articles with typos generated from the database, with the spelling correction code built in to the program. While on the subject, are you planning on running Humanbot again soon? Martin 21:18, 20 December 2005 (UTC)
There's no point - a recent Greasemonkey release did good things in general, but broke my script. I could modify it but the interface wouldn't be very good. I seriously think that Humanbot and this could go well together - didn't you know that Humanbot worked with a list of articles on a central server? :) Perhaps the correcting function can be at the client-side, but that isn't always a good idea. Consider, for example, automatically adding links (which requires a lot of data), and... well... maybe that's about it. But I would like it anyway! ;) r3m0t talk 21:34, 20 December 2005 (UTC)
Just some random feedback
  • The 'Filter' or 'Sort' buttons are no longer there. I can't find them in the menu. So I can't get a list that is sorted and has no user or talk pages in it.
  • Could it identify the 'Inuse' tag or other pages that should not or cannot be edited?
  • Something odd happens when I reach the last page in the list. I think it just comes to a halt and will not edit it. I may be mistaken.
  • Could you make the list window a little wider?
Many thanks. Bobblewik 13:43, 19 December 2005 (UTC)
Filter and sort are in the context menu of the list, it was getting a bit cluttered otherwise.
I'll look at the other stuff. Martin 13:51, 19 December 2005 (UTC)
Aha. I found them. Thanks. A general usability recommendation (I haven't got a reference) is that contextual menus are a convenience but not a replacement for drop down menus. This is so that users can explore all functions without having to right click in all parts of the interface. If you get time, could you add them to the drop down menus too?
Also, can I suggest 'Filter out duplicates' -> 'Remove duplicates', 'Sort alphebetically' -> 'Sort alphabetically'.
It would be useful (for me anyway) if it could do an initial 'Remove duplicates'just after the list is loaded. Links on page often produce duplicates. I cannot imagine any benefit in a non-alphabetical order so an initial 'Sort alphabetically' would be useful too. I know this will add complexity and duration so I understand if you don't want to add it to the wishlist.
Thanks for doing a great job. Bobblewik 14:51, 19 December 2005 (UTC)
Small request... can you add shortcuts to some of the functions, mainly Save and Ignore, please. BTW, thanks for the textfile support, it sure makes fixing typos a lot easier. Appreciate it. Gflores Talk 00:58, 20 December 2005 (UTC)
  • I'd love to be able to preview selected articles in the browser screen to see if I need to remove them from my list. I hate to switch between my regular browser and the program so much. - Mgm|(talk) 22:46, 27 December 2005 (UTC)
It has an option in the menu to go to the preview rather than diff to start with, if thats what you mean. Martin 22:49, 27 December 2005 (UTC)
  • I know you said somewhere else that AWB only works with the en wikipedia, but I was wondering if you could add the feature that converts old-style characters like &12345; to the appropriate unicode symbols. See This edit] for an example of why this is useful. (You don't need to be able to read kanji to understand that the post-edit is much easier to edit than the pre-edit) Neier 08:57, 31 December 2005 (UTC)
  • Also, instead of removing the year links, maybe it would be better to offer the choice to change them to something more appropriate. On the Indianapolis Colts page, someone took out the links but it struck me as a good idea to link to XXXX NFL Season where appropriate. So, 1995 -> 1995 NFL season (or 1995 in politics etc.) could be set as a default replacement by the tool. In context, editors should be able to determine which year links are worth keeping. Neier 08:57, 31 December 2005 (UTC)
  • In regard to the first suggestion, I would love to do that but I just dont know how I would find out what all the old form and new unicode characters are, I wonder if anyone knows how the pywikibots do it? In regard to the second suggestion, it wouldnt really be practical, it would have to be at the editors discretion. Martin 19:55, 31 December 2005 (UTC)

release the source code?

You should really consider releasing the source code to the AutoWikiBrowser. That way, other people can help improve it! :) --Ixfd64 10:47, 19 December 2005 (UTC)

Maybe one day, I need to clean up the code first. Martin 10:02, 21 December 2005 (UTC)

Publishing the date delinking regex

Information for users (Martin feel free to copy and use this info any way you want):

Martin has been kind enough to use my regex in the date delinking section. I am not as good at regex as I would like to be. Here are the concepts and the details:

  • The idea is to delink date elements that fail the date preference test. There are exceptions (see below). I do not use date preferences (personally: I tolerate any sequence when the month is non-numeric). Nor do I know how the date preference code works. But some info is at:

My ideal regex would match:

  • Any day of the week: (Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday|Mondays|Tuesdays|Wednesdays|Thursdays|Fridays|Saturdays|Sundays)
  • Any month: (January|February|March|April|May|June|July|August|September|October|November|December)
  • Any decade: ([0-9]{4}s)
  • Any three digit or four digit year: ([0-9]{3}|[0-9]{4})
  • Any century: ([0-9][a-z]{2} [Cc]entury|[0-9]{2}[a-z]{2} [Cc]entury|)
  • Any month/year combination such as 'February 2002'


I also try to avoid pages that discuss calendars and the origins of week/month names. My crude way is to search for the word 'calendar' and 'god'. But that could be tightened.

Here is the search regex I have used in the past:

  • ([^\]]{4})\[\[([0-9]{4}|[0-9]{4}s|[0-9]{3}|January|February|March|April|May|June|July|August|September|October|November|December|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\]\]

Note that the first few characters test for preceding square brackets. That is a crude method of avoiding a year link that is part of a full date.

Here is the replace field that I have used in the past:

  • $1$2

Here is the ignore field that I have used in the past:

  • calendar|Calendar|god|God

This is in an attempt to avoid controversy. For example, a page might talk about 'Odin' and link to Wednesday.

Known weaknesses:

  • The regex above does not test for ISO date formats. A 'false positive', although rare.
  • The regex above does not test for the year to be first in the sequence of a valid date. A 'false positive', less rare.
  • Ignoring dates with preceding square brackets is too crude. A 'miss'.
  • The regex above does not test for centuries. A 'miss'.
  • The regex above does not test for month/year combinations. A 'miss'.
  • The ignore field above is too crude. A 'miss'.

Suggested improvements (regex coding is not my speciality) would be welcome. Bobblewik 15:56, 19 December 2005 (UTC)

The part for three or four digit years could be combined to [0-9]{3,4} which'd make the expression a tad bit shorter. --Mairi 22:02, 19 December 2005 (UTC)
Thanks. Very useful. Perhaps you also know a way to shorten the search for centuries. Would ([0-9]{1,2}[a-z]{2} [Cc]entury) work instead of the one I suggested above? Could we do a similar thing for days of the week?
I am particularly keen on finding out how to avoid adjacent valid elements. The piece ([^\]]{4}) is supposed to trap 11 January 2005 by looking for the square brackets. Unfortunately it also traps London 2005, January 2005 and 1990-1995. Furthermore, there appears to be no limit to the number of spaces a valid date can have and it will not trap [[January 11]], [[2005]] because it has 5 consecutive character spaces. A similar problem applies when the year is the first element i.e. 2005-January 11. Bobblewik 23:10, 19 December 2005 (UTC)
Discussion of possible new regex (century, decade, year, month, day):
([^\]]{4})
\[\[
(\d*....entury
|\d{3,4}s
|\d{3,4}
|January|February|March|April|May|June|July|August|September|October|November|December
|(?:Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)days?
([^-\[])
\]\]
Bobblewik 23:41, 19 December 2005 (UTC)
That would work for centuries. Although you might want to use [_ ] instead of the space, as links with underscores work just as well. Another way of shortening things would be to replace all the [0-9] with \d (they function pretty much the same way).
There's also three digit decades, but I suspect links to them are far less common.
As far as false positives, you might want to avoid changing anything in articles with titles that are dates (such as 1010s), as they seem to have alot of [desirable] year, decade and century links.
I'll think about how to avoid adjacent elements and let you know if I come up with anything... --Mairi 00:05, 20 December 2005 (UTC)
I have amended the century bit as you suggest. I replaced the space with an underscore as you suggest. I had not thought of 3 digit decades, that is now included but it could come out again. I do not know how to trap date titles but I simply do not list them for processing. Bobblewik 00:53, 20 December 2005 (UTC)

You don't want to use .* for anything intended to be within [[ ]] as it will continue "eating" characters past the first set of ]] (and probably continue til the last set of ]] on the page). It'd also end up matching things like [[Monday Night Football]]. Is it just, say, Monday and Mondays that you want to match? --Mairi 01:49, 20 December 2005 (UTC)

Yes. I just want to match Monday and Mondays. I definitely don't want an unlimited match. Thanks for the warning. Feel free to amend it anyway you think would work. Bobblewik 01:53, 20 December 2005 (UTC)
That ought to work. It's a little non-intuitive, tho. It won't change the group numbering for replacement either, which is why it uses (?: ) instead of ( ).
It might also be a good idea to make the whole expression non-case-sensitive (if you can do that easily). --Mairi 02:10, 20 December 2005 (UTC)
Thank you very much. I bow to your superior knowledge on this. Feel free to do whatever you think will improve it. As long as it works, I don't care how. You are a great help. Bobblewik 02:27, 20 December 2005 (UTC)


Martin has just told me that it can do multiple passes. Here is my latest proposed regex:
First pass looks for century, decade, month, day:
Search: \[\[(\d*.. [Cc]entury|\d{3,4}s|January|February|March|April|May|June|July|August|September|October|November|December |(?:Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)days?)\]\]
Replace: $1

Second pass looks for year
Search:::([^\]]{4})\[\[(\d{3,4})\]\]([^-\[])
Replace: $1$2$3

Comments? Bobblewik 00:11, 21 December 2005 (UTC)

Breaking it apart sounds good. For the second
Search: (?<!\]\s*,?\s*)\[\[(\d{3,4})\]\](?!-|\s*,?\s*\[)
Replace: $1
Ought to have fewer misses. (I'm not as sure about the part in the first set of parens. if it doesn't work, just switch it to what was there before and use $1$2). Do you want it to remove links for things like [[1992]]-[[1993]] also? I also made a subpage of my userpage with a bunch of different dates for testing. Feel free to use it and add other cases to it. --Mairi 05:19, 21 December 2005 (UTC)
Yes, it should remove anything that is not valid for date preferences. So with a pair of dates such as [[1992]]-[[1993]], both should be delinked. That has been a particularly frustrating 'miss' and it looks bad to others. I was thinking that we should simply make it do a second pass to solve that.
I have also been wondering if our preceding link detection should actually look inside the preceding link for 'y]]' or '\d]]'. How about (?<![yhletr\d]\]\]\s*,?\s*) That could catch a lot of things like London 2005. We could do a similar thing when we check the following link e.g. (?!-|\s*,?\s*\[\[[JFMASONDjfmasond\d]) Bobblewik 08:40, 21 December 2005 (UTC)
I forgot to add anything about solitary month/year combinations. We would just add something like \d* (?:January|February|March|April|May|June|July|August|September|October|November|December


Latest proposed regex:
First pass delinks century, decade, month, day:
Search: \[\[(\d*.. [Cc]entury|\d{3,4}s|January|February|March|April|May|June|July|August|September|October|November|December |(?:Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)days?)\]\]
Replace: $1

Second pass delinks month/year combination like [[January 2002]]:
Search: \[\[((?:January|February|March|April|May|June|July|August|September|October|November|December) [\d]{3,4})\]\] \]\]
Replace: $1

Third pass delinks years that are not part of a date preference target. Should be no false positives and only a few misses:
Search:::(?<![yhletr\d]\]\]\s*,?\s*)\[\[(\d{3,4})\]\](?!-|\s*,?\s*\[\[[JFMASONDjfmasond\d])
Replace: $1$2$3

Fourth pass is a repeat of the regex in the third pass. This is to delink the second link of [[2002]]-[2005]]. Should be no false positives and only a few misses:
Search:::(?<![yhletr\d]\]\]\s*,?\s*)\[\[(\d{3,4})\]\](?!-|\s*,?\s*\[\[[JFMASONDjfmasond\d]) \[\[(January|February|March|April|May|June|July|August|September|October|November|December) ([\d]{3,4})\]\]

Not fully tested yet. Improvements welcome. Bobblewik 12:47, 21 December 2005 (UTC)

Think you want to change (?!-|\s*,?\s*\[\[[JFMASONDjfmasond\d]) to (?!-\[\[\d\d-\d\d\]\]|\s*,?\s*\[\[[JFMASONDjfmasond\d]) which ought to get rid of all the misses for dates followed by a hyphen. And I think the 4th pass ought not to be necessary, but I'm not sure. --Mairi 19:49, 21 December 2005 (UTC)
I will take your word for it. I don't fully understand the regex. I think the second pass (month/year) combinations can be merged into the first. If the 4th pass is not needed, that is good too. Change the proposed regex in any way you think best and most efficient. We should test it and then perhaps it will be time to ask Martin if he will adopt it. Thanks. Bobblewik 18:43, 22 December 2005 (UTC)
I tried this yesterday and it apperead that the forth pass was needed, at elast it was cathing things not found on the third. DES (talk) 20:36, 22 December 2005 (UTC)

I can't get these regexs to get any matches, are you testing them in AWB or some other way? Martin 22:14, 22 December 2005 (UTC)

In AWB, pasted in from this page. DES (talk) 22:17, 22 December 2005 (UTC)
Have you tried the most recent set of regexes? I cant get any results from them, the older ones seem better though. Martin 22:38, 22 December 2005 (UTC)

The ones I have been using are the 4-pass set listed above under "Latest proposed regex", as they were when first psoted to this page -- i haven't checked for any edits inm place after i copied them to a text file for easy access. I typically get hits on passes 1, 3, & 4, rarewly if ever have I seen a hit on pass 2 so far. I am doing a larger run now, -- I'll report on my results. i note that I have had to check the "remove all date links" option on the beta tab for these to work -- so far, based on insufficient testing, I need Both regegex in the set options tab and the chcekbox on the beta tab enabled -- i will confirm that with a specially constructed test page later today or tomorrow -- i have to log off in a minute. DES (talk) 22:47, 22 December 2005 (UTC)

I paste them into the Find field in AutoWikiBrowswer and run it on: User:Mairi/Date formatting. The first, third and fourth passes seem to work for me. The second pass does not work because it is faulty. Replace the second pass with:
Second pass delinks month/year combination like [[January 2002]]:
Search: \[\[(January|February|March|April|May|June|July|August|September|October|November|December) (\d{3,4})\]\]
Replace (there is space character before second dollar character): $1 $2
Bobblewik 23:37, 22 December 2005 (UTC)

Ah, thanks, could you just clarify what the best regexes are, as the 4th pass doesnt have a "replace" now, and do you use Mairi's change, thanks Martin 00:21, 23 December 2005 (UTC)


Yes, I use Mairi's change. The 4th replace is identical to the 3rd. To be explicit, here it is:

First pass:
Search: (?i)\[\[(\d*.. century|\d{3,4}s|January|February|March|April|May|June|July|August|September|October|November|December|(?:Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)days?)\]\]
Replace: $1

Second pass:
Search: (?i)\[\[(January|February|March|April|May|June|July|August|September|October|November|December) (\d{3,4})\]\]
Replace (there is space character before second dollar character): $1 $2

Third pass:
Search: (?i)(?<![yhletr\d]\]\]\s*,?\s*)\[\[(\d{3,4})\]\](?!-|\s*,?\s*\[\[[jfmasond\d])
Replace: $1

Fourth pass:
Search and replace: identical to third pass.
Bobblewik 12:42, 23 December 2005 (UTC)


Brilliant, I'll implement it soon (bed now though!). thanks Martin 00:56, 23 December 2005 (UTC)


Please do not use your tool in order to enforce style decisions that do not have broad consensus. Some authors (me included) like to link years in order point readers to background about the discussed period; other authors do not. This is an issue similar to British English vs. American English: it should be left to individual editors; marching in with overpowering technology is out of place. Thanks, AxelBoldt 18:22, 23 December 2005 (UTC)

To be fair, there is a consensus to do this, and all the opposition to that (including from myself) is against a bot doing it automatically. Martin 19:02, 23 December 2005 (UTC)
Martin, just letting you know that the 4 pass regex was updated on 12:40, 23 December 2005 Bobblewik 20:50, 23 December 2005 (UTC)

Testing

I have been runing tests of the 4-pass regex at User:DESiegel/Date Test Please take a look at the history. I will report results more fully shortly. DES (talk) 21:08, 23 December 2005 (UTC)

After further tests with the latest version of the date regex in the find/replace box, as shown above, I find there are soem interesting limits. Here is a diff showign the net effect of 4-passes: diff.

The points I note are:

  1. There seem to have been no false positives -- that is, no dates wwere unlinked where date prefernces would have functioned.
  2. [[19th century]] was not unlinked.
  3. Single years followed by hyphens without spaces were not unlinked, while single years followed by endashes or emdashes impemented as html entities were.
  4. Years with fewer than 4 digits were not unlinked
  5. Day of weeek abbreviations [[Mon]], [[Tues]], [[Wed]], [[Thurs]], [[Fri]], [[Sat]], and [[Sun]] were not unlinked.

This may be of interest. DES (talk) 22:15, 23 December 2005 (UTC)

Thanks for the extensive testing. You provided an important benefit. Taking your points in turn:
  • The absence of false positives is very welcome.
  • The failure to delink [[19th century]] is because that regex is case sensitive. The proposed new regex is case insensitive. That is what the (?i) piece at the beginning of the regex does.
  • Single years followed by hyphens are a known 'miss'. A hyphen is used in ISO format dates ([[2005]]-[[02-30]]). So it avoids dates with a hyphen. The new regex will reduce the number of misses a bit further. The ndash and mdash are not part of an ISO date so it does not need to avoid them.
  • The first regex only acted on 4 digit years. The proposed new regex will act on 3 or 4 digit years.
  • There are no plans to act on abbreviations of days of the week. If they are common, more work could be done. The proposed new regex will increase the scope to include plurals ([[Mondays]]) and be case insensitive to act on ([[monday]] etc).
Many thanks. That was useful. A test like that of the new one when it comes out will be very welcome. The more people testing and using it, the better. Bobblewik 23:53, 23 December 2005 (UTC)
New one is out now! Martin 23:56, 23 December 2005 (UTC)
Excellent. I will take a look. Bobblewik 23:58, 23 December 2005 (UTC)
First impressions are that it is much better. However, with year pairs separated by a hyphen([[1990]]-[[1995]]), it still only captures one of them. The 4th pass was supposed to capture the second of the pair. Hmm. Bobblewik 00:09, 24 December 2005 (UTC)
  • Ah I see about the ISO format dates. Note that the guideline now reccomends sinlge-part ISO format, such as [[2005-02-30]], but the two part form is still valid and must be allowed for. However, an ISO date is always a hyphen folowed by a 2-digit number, followed byanother hyphen -- [[1990]]-[[1995]] is not a valid ISO format. whether teh regex can relaibly be tweaked to make this distinction I'm not sure -- note that the roblem was with the year followed by the hyphen i.e. the first yeart in the range. This is far more likely to come up in biography articles than the reverse: birthdates known only to a year while death dates are exact are common, while the reverse is rather less common. DES (talk) 00:51, 24 December 2005 (UTC)
It is possible to improve the regex. It could look for [[1990]]-[[1995]] but it would have to avoid [[1990]]-[[1995]]-[[02-30]]. Determining the topic of an article (biography), or of the link (birth) is unlikely to be part of an efficient solution. There are a *lot* of permutations of false positives and misses. Having a pair of dates with only one linked looks very bad. A solution must be found. I don't think it will be that difficult, but I just do not know how to do it. I am hoping somebody else does. Bobblewik 13:32, 24 December 2005 (UTC)
I wasn't suggestign that lookign for biographies should be part of the solution, jsut that that was a context in which the issue commonly arises. DES (talk) 01:07, 26 December 2005 (UTC)


Additional false negative cases:

  • Piped dates, such as [[1923]]—[[1927|7]] or [[1923]]—[[1934|34]] are not converted. this is not infrequently done to express a date range.
  • Lists of dates in serial comma form, such as [[1987]], [[1992]], [[1995]], and [[2002]] -- only the last date is delinked. I presume this is due to the logic for detecting [[1995]], [[23 March]]

You can see my latest test with version 0.9.9.5 in this edit

Anyone may feel free to use User:DESiegel/Date Test for tests, please revert to a fully linked version after performing any tests. DES (talk) 01:07, 26 December 2005 (UTC)

Proposed improvements (tackles years in one pass instead of two, includes consecutive linked years, includes piped years):
The third pass would become:
Search: (?i)(?<!(?:january|february|march|april|may|june|july|august|september|october|november|december| \d{1,2})\]\]\s*,?\s*)\[\[(?:(\d{3,4})|\d{3,4}\|(\d{1,2}))\]\](?!-\[\[\d\d-|\s*,?\s*\[\[(?:january|february|march|april|may|june|july|august|september|october|november|december|\d{1,2} ))
Replace: $1$2
Note that the search regex contains two space characters.

The fourth pass would be deleted.
Seems ok but further testing is wanted.
Bobblewik 19:12, 26 December 2005 (UTC)

Martin, can you confirm if this is the regex version in AutoWikiBrowser? I just tested 0.9.9.5 and it does not appear to be. Bobblewik 20:39, 26 December 2005 (UTC)
It is in version 1.0 Martin 20:46, 26 December 2005 (UTC)
I can't access version 1.0 but I will wait till I can. Thanks. Bobblewik 20:59, 26 December 2005 (UTC)
You should be able to download it, is there a specific problem? Martin 22:07, 26 December 2005 (UTC)
I could not before. Perhaps it was something to do with page caching. I have version 1.0 now and from a quick check, it seems to be working as expected. Bobblewik 01:31, 27 December 2005 (UTC)

User approved list

I think it is letting me edit even if my username isn't on the approved list (I checked it using another one of my accounts). Broken S 22:04, 19 December 2005 (UTC)

Seems ok to me, It only checks on start up. Martin 22:22, 19 December 2005 (UTC)
oh, yeah I switched it to another user account in the middle. Broken S 22:30, 19 December 2005 (UTC)

Removal cats altogether?

This is an awesome tool. I have one request for isntruction (or a feature request...): I tried setting the "New category" to a blank box, but it's too clever. Is there a way I can carry out a blanket category removal without replace? -Splashtalk 22:35, 19 December 2005 (UTC)

BUG: Unicode limitations?

AWB just failed to load Michał Wiśniowiecki, King of Poland: it wants to load Michal Wisniowiecki, King of Poland instead (I've made the latter REDIRECT to the former). I copied that first link from the text-box having removed the entry from the list. There is obviously some translation problem between the list-box and the browser control. HTH HAND —Phil | Talk 16:18, 20 December 2005 (UTC)

Its an annoying problem, so far I havent been able to narrow down the problem, or reproduce it at all in such a way as to make the problem visible. Thankfully it only affects very few articles. Martin 19:24, 20 December 2005 (UTC)


I have found the root of the problem;

if you navigate to;

http://en.wikipedia.org/wiki/Michał_Wiśniowiecki,_King_of_Poland

it automatically converts the URL to (in firefox and IE);

http://en.wikipedia.org/wiki/Micha%C5%82_Wi%C5%9Bniowiecki%2C_King_of_Poland


which is right, but if you navigate to the edit URL;

http://en.wikipedia.org/w/index.php?title=Michał_Wiśniowiecki,_King_of_Poland&action=edit

in firefox it does it properly and navigates to

http://en.wikipedia.org/w/index.php?title=Micha%C5%82%20Wi%C5%9Bniowiecki,%20King%20of%20Poland&action=edit

however in IE it navigates to

http://en.wikipedia.org/w/index.php?title=Michal_Wisniowiecki%2C_King_of_Poland&action=edit

which is clearly wrong. I dont know why it does this, seeing as Firefox doesnt have the problem maybe its a bug in IE. Martin 12:29, 21 December 2005 (UTC)

Not in my version of IE (IE6.0.2900.2180) it doesn't. I shift-clicked on that first edit URL and the correct page came up just fine: interestingly the URL in the address bar remains the same. The second URL does the same (although obviously showing the different URL in the address bar). So there's obviously something cronky going down :-( Phil | Talk 13:25, 21 December 2005 (UTC)
Try typing the URLs into the browser (or copy and pasting at least), that has a different result to clicking on the link, which further indicates that something fishy is going on. It doesnt affect all pages with unusual fonts in the title so it isnt a massive problem. Martin 13:42, 21 December 2005 (UTC)

I am not sure how AutoWikiBrowser works. Can you advise?

The date delinking regex has a lot of 'misses'. It is difficult to distinguish between 11 January 2005 and January 2005, so I simply avoid consecutive links. Thus it will delink the January but it leaves the 2005 intact. There are many other permutations that I miss with the huge regex. Unless I do all articles twice, there are lots of misses.

If it operated sequentially, I could do a lot more. For example, it could tackle day, month, decade, century links first. I do not have to check for consecutive links in those cases. Then if it did another search of the same article, a search for year links could be more focussed and effective.

I expect that it AutoWikiBrowser simply has a huge regex. But I don't actually know what it does. Does it, or could it, go through the article more than once? Bobblewik 23:45, 20 December 2005 (UTC)

There is no huge regex, the one I copied from you is the biggest one. It can a run through an article as many times as you want. If you create the regexes you want to use then I can implement them resonably easily. Martin 23:51, 20 December 2005 (UTC)
Fantastic! Thanks. Bobblewik 00:01, 21 December 2005 (UTC)

another suggestion: AWB "lite"

Would it be a bad idea to make a "lite" version (with limited features) for people without authorization? --Ixfd64 02:36, 21 December 2005 (UTC)

If you can cycle through articles quickly it is a good tool to vandalise with, and if I disabled that then it wouldnt really have any benefit. thanks Martin 09:49, 21 December 2005 (UTC)

BUG: another limitation

The article In The End: Live & Rare just came up for processing, but AWB can't deal with it. It keeps trying to load the article In The End: Live which obviously gives it gyp. The list contains "In The End: Live &amp; Rare" which doesn't necessarily help: fixing it did nothing. Another encoding problem? HTH HAND Phil | Talk 15:56, 21 December 2005 (UTC)

Same bug as above, I've done a brute force fix, so it will work in the next release. Martin 16:20, 21 December 2005 (UTC)

Multiple search/replace

How difficult would it be to arrange that we could search/replace more than one thing at a time? I'm thinking that it would be nice if we could do these at the same time, maybe as a side-effect of other stuff:

  • &mdash; → "—"
  • &ndash; → "–"
  • &hellip; → "…"
  • &rarr; → "→"

HTH HAND —Phil | Talk 18:06, 21 December 2005 (UTC)

Can those changes always be made? if so I'll add it to general fixes or maybe as another option. Martin 19:36, 21 December 2005 (UTC)
I think what would be nice would be if the "general fixes" could be presented as an option list, available from the "Options" menu, so that we could pick/choose which we wanted to apply at any given time: I've been wondering exactly what was involved in this, and I gerenally simply switch it off. You could have a little table of "entity replacements" like the above to which we could add our favourites: it should be a snip to include a tick-box for each saying "yes, do this one". HTH HAND —Phil | Talk 10:15, 22 December 2005 (UTC)
The idea of the general fixes is that they are minor things that can always be applied to main namespace articles, being able to turn them off individually would just add coding and usability complications. I have added the above conversions in now. Martin 10:54, 22 December 2005 (UTC)
  • It would, however, be nioce if there was a list of the geenral fixs avalable somewhere, so we knew what we are potentially doing. Also, if no change is found for the primary task, (such as date unlinking) but a general fix is found, the edit comment will be misleading at best. Perahsp no general changes should be applied if a specific change is specified and not found? that may be too much work to be worthwhile, the user can always click "skip". DES (talk) 20:39, 22 December 2005 (UTC)

Logging in

When i first started to use AWB yesterday, it soemhow operated as User:205.210.232.62 (my usual IP whn not logged in) until i realized this, and clicked on the log-in link in the AWB browser window. Must one normally log-in separately from AWB even if already logged in on another browser? Your demo video does not show this. Or did I do soemthing incorrect? DES (talk) 20:42, 22 December 2005 (UTC)

It only checks that you are logged in when you first start it, so if it became logged out it would carry on working, I'll change this at some point in the future. Also, it uses the Internet Explorer core, so if you are logged in in that then you will remain logged in in this program. Martin 20:52, 22 December 2005 (UTC)
I use IE. I was logged in on an instance of IE, when i started the program. as far as i know (but I did not verify absolutely) I did not become logged out in my regualr IE sessions. But the first edit done using AWB was done logged out. it is posisble that while I figured out how to do things my cookie expired, but when i switched back to an IE window i still seemed to be loged in -- it was an absence of the AWB edits in "My contibutions" that prompted mne to check for non-logged-in-ness of AWB. I report this for what it is worth, if anything. If it recurs repeatably I will let you know -- i know how hard an unrepeatable issue is to address, and i know we are all volunteers. Again, MANY thanks for AWB -- just downloded ver 0.99 and i'm about to try it. DES (talk) 21:11, 22 December 2005 (UTC)
At the moment the verification for logging in is pretty basic, I'll change it soon so it checks cookies, possibly on every edit. thanks. Martin 21:15, 22 December 2005 (UTC)
I am getting problems with login. I am not at my usual computer. I was logged in to Wikipedia in my normal browser, then I started AutoWikiBrowser and tried to process a page. It complained that I was not logged in and showed the Wikipedia login page. I went to Wikipedia in my normal browser, logged out then logged in again. I closed AutoWikiBrowser and launched it again. But it gave the same symptoms. Bobblewik 11:30, 29 December 2005 (UTC)
What is your normal browser? when it showed the wikipedia login page, were you actually logged in then (i.e. did it have all the "Sign in / create account")? and did you try entering your login details in the wndow it loaded for you? thanks Martin 11:40, 29 December 2005 (UTC)
I use Firefox. When AutoWikiBrowser showed the login page, I was not logged in and it showed the "Sign in/create account" page. I entered my login details in AutoWikiBrowser and it worked. Bobblewik 12:36, 29 December 2005 (UTC)
That is what should happen, it is totally unrelated to any settings/ cookies in firefox. thanks Martin 12:41, 29 December 2005 (UTC)
Really? It is the first time I have had to do it. It had me fooled for a while because I have followed the instructions not to click in the window. Well, at least I know the exception to the rule now. Thanks. Bobblewik 13:01, 29 December 2005 (UTC)
Sorry, I'll make the messages more descriptive. Martin 13:20, 29 December 2005 (UTC)

Manual changes

When trying out AWB for the first time yyesterday (I made a date-delink run on Category:Biography) I attempted to make a manual change on one article (correcting an incorrect category) in the AWB browser. It appeared as if this chage was undone when i saved the articel, but I am not sure. Are manual changes supposed to be saved? I gather that there is no way to do a diff or preview for manual changes at the moment, is this correct? DES (talk) 20:45, 22 December 2005 (UTC)

Mind you, evne with these limitations this is a wonderful tool. DES (talk) 20:45, 22 December 2005 (UTC)
It saves all manual changes, if you click the "Diff" button (or "preview") it will show you the extra changes you have made, but it will always save them. Martin 20:50, 22 December 2005 (UTC)
Thank you. I assume you mean "It will not show you..." above? DES (talk) 21:13, 22 December 2005 (UTC)
No, it does show the extra changes you have made. Martin 22:58, 22 December 2005 (UTC)
Thanks. I was confused by the "but" in your wording. I now realize that I incorrectly made changes via the browseer window above, not the edit box on the right, and that is why they were not saved. Sorry for the false alarm. DES (talk) 23:05, 22 December 2005 (UTC)

Was it worth it?

I noticed that B/AWB recently made the earth-shattering change from

== Municipalities ==

to

==Municipalities== in one article.

Just wondering... Why?

Picapica 23:50, 22 December 2005 (UTC)

It does a lot more than that, it is up to the user as to whether to save or not. Martin 00:01, 23 December 2005 (UTC)

A truly Delphic response, Martin. It's not the outer limits represented by "a lot more than that" that bother me, however, but the inner pickiness of a routine that goes to all the bother of removing two spaces with the result, as far as I can see, of affecting what appears on screen not one jot... Is it not legitimate to comment upon that? -- Picapica 00:49, 23 December 2005 (UTC)

It removes the spaces to make other changes easier, not specifically because it is directly a good thing to do. It is just as easy to click ignore as it is save. Martin 00:53, 23 December 2005 (UTC)

Many thanks for the speedy response, Martin, but this "click ignore" is a new thing to me, even though I've been editing Wikipedia for what seems like yonks now. How does it work? And how precisely does your having removed spaces which make not one jot of difference to what appears on screen (see "weird edits" above) "make other changes easier"? -- Picapica 01:06, 23 December 2005 (UTC)

The person running the script should tell the program to skip a page (click ignore) if the only change is somthing as minor as you described. Stadnard style is not to use the spaces so if the program is doing somthing on the page it might as well fix small nitpicky things that aren't really woth making edits for. Have you even used the program? I think you are being a bit unfair, the program is really quite good. Broken S 05:29, 23 December 2005 (UTC)

Not really "unfair" (I hope), BrokenSeque, just plain ignorant: I don't even know what "running the script" means. Clearly I've stumbled into a parallel wikipedia world involving some kind of automated editing. Me, I just do the old-fashioned one-man look-think-and-if-necessary edit routine: I haven't come across anything in the "advice to editors" introductory pages dealing with "using programs" which "remove spaces to make other changes easier". I was just wondering why anyone/anything would go to the bother of carrying out makes-no-difference changes to articles when there is so much else that needs to be checked. I shall have to investigate further... Merry Christmas, anyway. -- Picapica 16:16, 25 December 2005 (UTC)

  • OK, I'll just clarify a few things:
  • "Running the script" means running the program, (it isn't a script. but it's just a technical difference).
  • The software is new and still being developed, so its not a surprise you havent heard about it.
  • We still do the old fashioned way of editting too!
  • It isnt designed for making trivial changes, it is designed to make repetitive tasks easier (e.g. stub sorting, re-categorisation...) and thus leave more time to do editting the normal way!
  • thanks Martin 18:59, 25 December 2005 (UTC)

"Ignore if contains"

"Ignore if contains" does not appear to work if the text is in the title.

It would be nice to have the ability to have two fields: one to match the title text; and another field (as now) to match body text. [unsigned comment by Bobblewik]

OK, but surely all articles have title text in the article. Martin 00:18, 23 December 2005 (UTC)
A reasonable assumption but not true for all articles. This page contains User talk:Bluemoose in the title but not the body. Not a very good example, but it disproves the assumption. The "ignore if contains" field is a way of avoiding false positives. Processing a page that has relevant text in the title would be a false positive. It would be rare though. Bobblewik 00:38, 23 December 2005 (UTC)
Thanks Martin, for adding this feature. Yet another notch on your 'great guy' stick. Bobblewik 14:01, 24 December 2005 (UTC)

Checking for improper year removals

I asked User:Bobblewik about this on his talk page, and he directed me here. In removing standalone year links, there are some which should not be removed because the article on the year contains a factoid about the original article, and (especially in earlier years) it helps frame the current article with other events of the same year (century links would apply, also). The example I cited is the link to 1117 from Mii-dera.

So, since the Special:Whatlinkshere/Mii-dera shows the reverse link from 1117, if AWB can limit its date regexp matches to anything NOT found in the Whatlinkshere pages, it would prevent the wrongful removal of years. If it can't, then anyone using AWB for date link removal needs to be careful not to remove anything important.

Can the author clarify whether the above is possible or not?? Neier 08:55, 23 December 2005 (UTC)

It is possible, but overly complicated and beyond the scope of this software, it would also use an enormous amount of bandwidth. sorry Martin 09:14, 23 December 2005 (UTC)

So, basically we have several projects that are busily linking dates, and then this one de-linking them. Why? Sombody just delinked 1967 as the year of the Six-Day War. Not useful!

What we really need is making sure that any year near any month and day is linked. Please don't de-link dates.

--William Allen Simpson 12:33, 30 December 2005 (UTC)
This isnt a project to de-link dates. It is a piece of software that can be used to help with some tasks. Also, I dont know of any project to link dates, if there is it is going against guidlines. Martin 12:45, 30 December 2005 (UTC)

Extend functionality of replace specification

I'm floundering a bit for the actual name for this, but it'll probably come to me just after I click "Save" Smile eye.png. What I'd like is to be able to specify sub-expressions in the Search box and refer to them in the Replace box. like in MS Word. For example:

Search for  : {{sodium}}<sub>([01-9]*)</sub>
Replace with: {{sodium|\1}}

where the \1 refers to the first item in (…) as shown. HTH HAND —Phil | Talk 10:58, 23 December 2005 (UTC)

Regular expressions can handle this type of thing, try;
Search for  : {{sodium}}<sub>([01-9]*)</sub>
Replace with: {{sodium|$1}}


Template replacement

Do you think it would ever be possible for AWB to be taught to understand templates and parameters? This would make it much easier to mass-change templates which have parameters renamed, or which have been moved or replaced.

So given a template name, it would look for

{{template name

and then search for the matching "}}". If you told it how many parameters and what the new name for each one should be, it could match up the number of "|" symbols. Obviously this is a bit blue sky right now but I thought I'd set it down for consideration. HTH HAND —Phil | Talk 10:58, 23 December 2005 (UTC)

Certainly worth thinking about, problem with this type of this is reliability, I'll look into it though. Martin 11:32, 23 December 2005 (UTC)

License, source code?

Under what license is this software published? I take it from discussion on this page that the source code is not available? AxelBoldt 18:19, 23 December 2005 (UTC)

What license is this type of thing normally published under? PD I would guess. Martin 18:58, 23 December 2005 (UTC)
Normally, IME such things are published under some version of shareware or freeware -- that is the creator retains copyright but allows others to freely use the software. Soemthing like GPL in intent but less formal. often just "Copyright by PersonX, permisison to user is granted, proveded..." with whatever restrictions the creator wants -- non-removal of copyright notice, non-commercial use, whatever. In thsi case I would suggest limiting use to within-policy edits by approved users on wikipedia, and make license revokable on notice. That gives you all the protection you might plausibley need. But it is up to you,m as the creator, to set your terms. DES (talk) 21:34, 23 December 2005 (UTC)
If you want to release it into the public domain, you have to actively do it (with a note to that effect somewhere on the AutoWikiBrowser page for instance); by default you retain all copyrights and nobody but you is allowed to copy the program. AxelBoldt 19:02, 24 December 2005 (UTC)

en dash to hyphen change?

Comment moved from User talk:Bobblewik
Your bot is changing birth and death dates from use of a en-dash to a hyphen. Not only is this change gramatically incorrect, it seems like a rather controversial change to assign to a bot. Where did the decision to do this gain consensus?--Alabamaboy 21:00, 23 December 2005 (UTC)

Can you give an example of the article where this has been done? Bobblewik 21:10, 23 December 2005 (UTC)

Here it changed

to

which are the same, I think this is what he means. Martin 21:20, 23 December 2005 (UTC)

Aha. So there was no change. He must be assuming that the ndash he sees on the screen is a hyphen. It is an easy mistake to make, both characters look the same to me too. Bobblewik 21:41, 23 December 2005 (UTC)
ItHe just went through and changed all the &ndash;s and &mdash;s in Butter, too. I realize either style is allowed, but I prefer the former and find this change slightly disruptive. —Bunchofgrapes (talk)
I do not quite understand what it did wrong. Can you be specific about the change you think it made? Bobblewik 22:22, 23 December 2005 (UTC)
I dont see how that is anything other than helpful, if you could explain otherwise it would be good (p.s. "it" is a "he") Martin 22:29, 23 December 2005 (UTC)


It seems to be changing the html entities &ndash; and &mdash; to the equilivant actual characters. this change has no effect on display, but does have effect on later editing. Using the html entity makes it clear to an editor which kind of dash has been used, using the actual character makes this much harder for a later editor to determine. DES (talk) 22:32, 23 December 2005 (UTC)</nowiki>
These are, in fact, the first two changes listed above in the section "Multiple search/replace". DES (talk) 22:33, 23 December 2005 (UTC)
Exactly. As an editor, I prefer "&ndash;" to "–", since I cannot easily tell which dash is which otherwise. —Bunchofgrapes (talk) 00:44, 24 December 2005 (UTC)

Me too. I much prefer using the html entities than the literal characters. Look:

Hyphen &ndash; &mdash;
These all look different in the edit box: -
These all look exactly the same in the edit box: -

See what we mean? -- ALoan (Talk) 02:20, 24 December 2005 (UTC)

I'll remove this option from the nect version then, although I wish the devs would create some wiki markup to make these characters, as the html looks terrible, and 90% of people would have no idea what it is. Martin 11:56, 24 December 2005 (UTC)

Thanks. This was confusing but glad things worked out. Best, --Alabamaboy 13:43, 24 December 2005 (UTC)

Yes I agree that html versions are difficult to read and should be made easier if possible. I have long thought that this should be applied to superscript characters. There was a discussion in the Manual of Style (now archived) about this. My main concern was that it should be browser independent and accessible (e.g. to screen readers). I think there was agreement for editing superscript html to something else but it got a bit too technical for me. Bobblewik 13:58, 24 December 2005 (UTC)

Minor date bug in .9.9.5

(354-430) comes out as (354-430). (Roman Catholic Church)--SarekOfVulcan 08:58, 26 December 2005 (UTC)

Also, the 1946 in the first line of It's a Wonderful Life wasn't delinked -- leaving it intact for now so you can debug more easily.--SarekOfVulcan 09:20, 26 December 2005 (UTC)

This is a problem with the regex used to find the dates, see the above section "Publishing the date delinking regex". thanks Martin 10:29, 26 December 2005 (UTC)
Thank you very much for the bug report. Just to be more explicit: We are aware of the problem and can replicate it easily in test articles. So we don't need those particular articles for debugging, feel free to delink them properly if you wish. The problem is solvable and we are working on it. But we have not yet solved it. Join in the "Publishing the date delinking regex" discussion. Bobblewik 16:34, 26 December 2005 (UTC)

Minor bug in .995 Template alpha sort

Looks like it's taking out template links. When it alpha sorts during cleanups See: [6]

It didnt remove them, but put them at the bottom of the page, because it thought they were stubs, I'll get it to ignore stubs templates that are of the form {{tl|mil-vehicle-stub}}. thanks Martin 09:47, 27 December 2005 (UTC)
Fixed. thanks Martin 17:27, 27 December 2005 (UTC)

suggestion

In addition to the ordering and the alphebetical listing of categories and language links it would be good if the same could be done with {{Link FA}} templates since even though a lot of articles have non or have so few that it isn't worth it there are some articles that have more than enough to make it worthwhile especially as it will fall into wider usage. JtkieferT | C | @ ---- 09:59, 27 December 2005 (UTC)

I'll add it to the todo list! Martin 17:27, 27 December 2005 (UTC)

More suggestions

Here are some suggestions:

  • One place to choose binary options. For example "Apply general fixes" and "Remove all date links" are binary choices.
  • One place to set "Search in namespace:" options. The current options would presumably be "Template" and "Main". Instead of removing unwanted articles after the search, the user would simply modify the default options before the search.
  • Arrangement of widgets:
    • The 'Save' and 'Ignore' buttons are like 'OK' and 'Cancel'. It might be nice to arrange the buttons so that they are aligned along the bottom *after* the 'Summary'. OK should be to the left of Cancel, so 'Save' would be to the left of 'Ignore'.
    • The 'Category' field seems to me to be the first logical thing and I think it should be above the 'Make from' field (which seems to me to be the second thing).
    • The 'Make list' button seems to me to be just like a 'Search' button. Perhaps that might be a more obvious label.
    • The 'Diff' button seems to have the same function as the 'Show changes' button in the Wikipedia editor. The latter may be a more obvious label.
    • The 'Start!' button seems to me to be just like an 'Edit' button. Perhaps that might be a more obvious label, even though it also does other things.
    • The 'Messaging' bit fooled me when we first started. If it deals with the talk page, perhaps we should say that explicitly like 'Edit talk page'.
  • The list can get very long, it would be nice to be able to increase its length and width, perhaps with scroll bars and/or by actually increasing the size.
  • It would be nice to have a longer summary field.


Just some thoughts. Feel free to use or ignore them. Bobblewik 12:13, 27 December 2005 (UTC)

Thanks for the suggestions, the problem with a few of them is simply not having the space (e.g. "Show changes" is just too long to put on the button). Also the list box does have scroll bars. I'll see what I can do though! Martin 12:30, 27 December 2005 (UTC)
I agree there is limited space. That is partly why I suggested copying the Wikipedia edit layout and putting the buttons horizontally. The whole thing might look more immediately recognisable if it copied elements of the Wikipedia edit design. You could put search functions at the top and/or left (perhaps using the whole vertical space) and edit functions at the bottom. Bobblewik 13:33, 27 December 2005 (UTC)
Check out version 1.1 thanks Martin 17:27, 27 December 2005 (UTC)
Thanks. Nice improvements of detail. I also just noticed the standard summaries, I was going to suggest that after somebody complained to me about using a generic summary. If we could do it all without tabs, that would be nice too. Bobblewik 17:44, 27 December 2005 (UTC)

another minor bug

I've only tested it out when changing over lists of articles from one category to another but when using make list with a category with an ampersand(&) in it it doesn't give any results despite the fact there are articles in the category. JtkieferT | C | @ ---- 22:52, 27 December 2005 (UTC)

Its that dang IE problem, I'll have it fixed for next release. Martin 22:57, 27 December 2005 (UTC)
I figured it was IE handling the character wrongly. JtkieferT | C | @ ---- 23:01, 27 December 2005 (UTC)

Requested feature: remove redundant links

One of my favorite cleanups is to remove all occurences of a link after the first one. Is this straightforward enough to do? I could write it in VFP (and I still might), but if it were included in AWB, it would make things easier for me.

Thanks for all the hard work to date!--SarekOfVulcan 06:50, 28 December 2005 (UTC)

hmmm, yes I think that is easy enough, another thing to the todo list! Martin 10:23, 28 December 2005 (UTC)
That would be a great feature. A complication would be that it might have to avoid double links used for date preferences. Bobblewik 11:30, 28 December 2005 (UTC)
Good point, to start with I'll make it so it tells you where the extra links are, rather than actually removing them automatically. Martin 11:36, 28 December 2005 (UTC)
A crude way to avoid valid configurable dates would be to avoid any link that contains either a digit or a complete month name. If you publish the regex, we could work together to make it less crude. Bobblewik 16:32, 28 December 2005 (UTC)
Sure, but I dont know when or exactly how I will do it, I'll let you know though. Martin 16:49, 28 December 2005 (UTC)

I feel that in a long articel it is often valid to link to the same destination agian after severla paragraphs, particualrly after more than one screen-full. So I think that "removal of reduandant links" would be better as a pointer than an automated tool. DES (talk) 16:46, 28 December 2005 (UTC)

Agreed. Martin 16:49, 28 December 2005 (UTC)

regex syntax

The regexs do not seem to be useing "classic" basic regular expression syntax, as described in Regular expression but some extension (in particualr "(?i)" for case insensative does nbot appear in any of the versions in our article. Exactly which version of regualr expressions does AWB use, and is the syntax documeted anywhere online? DES (talk) 16:52, 28 December 2005 (UTC)

According to MS it is very similar to the Perl 5 implementation, I have used this page for a few tips, but I'm not really knowedgable on them. There is an extra parameter in c# that can specify the regex to be case insensitive if that helps. Martin 17:00, 28 December 2005 (UTC)
This is the documentation for the .NET regex syntax, which I believe is what AWB uses. --Mairi 04:12, 29 December 2005 (UTC)

External links

I processed Noah Wyle but it did not identify 'External Links'. I had to make the change to 'External links' manually. Bobblewik 12:38, 29 December 2005 (UTC)

You didnt make that change in that page at all. regards Martin 12:43, 29 December 2005 (UTC)
How odd. I was convinced that I did. Bobblewik 12:59, 29 December 2005 (UTC)

Another update to the date regex

I am seeing quite a few misses related to split dates like: [[January]] [[18th]], [[January]] [[18]] and [[18th]] and [[19th century|19th centuries]]. I may want to update the regex to cope with these if that is ok. Bobblewik 14:53, 29 December 2005 (UTC)

Not a problem at all. Martin 15:11, 29 December 2005 (UTC)
Martin, The regex section on this talk page is quite complicated now. Would be kind enough to confirm what is actually being used? Also, do you think that there would be any benefit in dividing the current single date delink option into multiple options (e.g. delink solitary months, delink centuries, delink solitary days etc)? Bobblewik 18:34, 1 January 2006 (UTC)

Template subst'ing

AWB's description says it will add auto template subst'ing in the future. Using WP:SUB, I have come up with this regexp:

Replace:

{{(bio-cats)}}|{{(clear)}}|{{(clearleft)}}|{{(clearright)}}|{{(copyvio)}}|{{(lived)}}|{{(Lifetime)}}|{{(lifespan)}}|{{(Prettytable)}}|{{(sub)}}|{{(sup)}}|{{(moved)}}|{{(moved-n)}}|{{(tmfrom)}}|{{(tmto)}}|{{(unsigned)}}|{{(unsigned2)}}|{{(3RR)}}|{{(3RR2)}}|{{(3RR3)}}|{{(nn-warn)}}|{{(nothanks)}}|{{(nothanks-sd)}}|{{(obscene)}}|{{(selftest)}}|{{(test-n)}}|{test2-n)}}|{{(test2a)}}|{{(test2a-n)}}|{{(test2b)}}|{{(test3-n)}}|{{(test4a)}}|{{(test4-n)}}|{{(test)}}|{{(test0)}}|{{(test1)}}|{{(test2)}}|{{(test2a)}}|{{(test3)}}|{{(test4)}}|{{(test5)}}|{{(test6)}}|{{(blatantvandal)}}|{{(bv)}}|{{(attack)}}|{{(No personal attacks)}}|{{(Npa)}}|{{(Npa2)}}|{{(Npa3)}}|{{(Npa4)}}|{{(blanking1)}}|{{(blanking2)}}|{{(blanking3)}}|{{(blanking4)}}|{{(drmafd)}}|{{(drmafd2)}}|{{(drmafd3)}}|{{(drmafd4)}}|{{(drmafd5)}}|{{(MIPblock)}}|{{(multipleIPs)}}|{{(spam)}}|{{(spam2)}}|{{(spam2a)}}|{{(spam3)}}|{{(spam4)}}|{{(vanity)}}|{{(vblock)}}|{{(verror)}}|{{(verror2)}}|{{(verror3)}}|{{(verror4)}}|{{(Edit summary personal)}}|{{(Editsummarynew)}}|{{(sofixit)}}|{{(Summary)}}|{{(Edit summary)}}|{{(name your images)}}|{{(image source)}}|{{(image copyright)}}|{{(subst)}}|{{(SharedIP)}}|{{(ISP)}}|{{(AOL)}}|{{(repeat vandal)}}|{{(anon vandal)}}|{{(vw)}}|{{(Award)}}|{{(newvoter)}}|{{(welcome)}}|{{(welcome2)}}|{{(welcome3)}}|{{(welcome4)}}|{{(welcomeip)}}|{{(anon)}}|{{(Album Image)}}|{{(afd)}}|{{(afd2)}}|{{(afd3)}}|{{(afd|bottom)}}|{{(afd|top)}}|{{(tfd2)}}|{{(tfdnotice)}}|{{(ifd)}}|{{(ifd2)}}|{{(idw)}}|{{(idw-uo)}}|{{(idw-pui)}}|{{(idw-cp)}}|{{(cfd)}}|{{(cfd2)}}|{{(cfdu)}}|{{(cfr)}}|{{(cfr2)}}|{{(cfru)}}|{{(cfm)}}|{{(cfd-article)}}|{{(cfr-speedy)}}|{{(tfd-keep)}}|{{(Actinium)}}|{{(Aluminum)}}|{{(Americium)}}|{{(Antimony)}}|{{(Argon)}}|{{(Arsenic)}}|{{(Astatine)}}|{{(Barium)}}|{{(Berkelium)}}|{{(Beryllium)}}|{{(Bismuth)}}|{{(Bohrium)}}|{{(Boron)}}|{{(Bromine)}}|{{(Cadmium)}}|{{(Caesium)}}|{{(Calcium)}}|{{(Californium)}}|{{(Carbon)}}|{{(Cerium)}}|{{(Chlorine)}}|{{(Chromium)}}|{{(Cobalt)}}|{{(Copper)}}|{{(Curium)}}|{{(Darmstadtium)}}|{{(Dubnium)}}|{{(Dysprosium)}}|{{(Einsteinium)}}|{{(Erbium)}}|{{(Europium)}}|{{(Fermium)}}|{{(Fluorine)}}|{{(Francium)}}|{{(Gadolinium)}}|{{(Gallium)}}|{{(Germanium)}}|{{(Gold)}}|{{(Hafnium)}}|{{(Hassium)}}|{{(Helium)}}|{{(Holmium)}}|{{(Hydrogen)}}|{{(Indium)}}|{{(Iodine)}}|{{(Iridium)}}|{{(Iron)}}|{{(Lanthanum)}}|{{(Lawrencium)}}|{{(Lead)}}|{{(Lithium)}}|{{(Lutetium)}}|{{(Magnesium)}}|{{(Manganese)}}|{{(Meitnerium)}}|{{(Mendelevium)}}|{{(Mercury)}}|{{(Molybdenum)}}|{{(Neodymium)}}|{{(Neon)}}|{{(Neptunium)}}|{{(Niobium)}}|{{(Nitrogen)}}|{{(Nobelium)}}|{{(Osmium)}}|{{(Oxygen)}}|{{(Palladium)}}|{{(Phosphorus)}}|{{(Platinum)}}|{{(Plutonium)}}|{{(Polonium)}}|{{(Potassium)}}|{{(Praseodymium)}}|{{(Promethium)}}|{{(Protactinium)}}|{{(Radium)}}|{{(Radon)}}|{{(Rhenium)}}|{{(Rhodium)}}|{{(Roentgenium)}}|{{(Rubidium)}}|{{(Ruthenium)}}|{{(Rutherfordium)}}|{{(Samarium)}}|{{(Scandium)}}|{{(Seaborgium)}}|{{(Selenium)}}|{{(Silicon)}}|{{(Silver)}}|{{(Sodium)}}|{{(Strontium)}}|{{(Sulfur)}}|{{(Tantalum)}}|{{(Technetium)}}|{{(Tellurium)}}|{{(Terbium)}}|{{(Thallium)}}|{{(Thorium)}}|{{(Thulium)}}|{{(Tin)}}|{{(Titanium)}}|{{(Tungsten)}}|{{(Ununbium)}}|{{(Ununhexium)}}|{{(Ununoctium)}}|{{(Ununpentium)}}|{{(Ununquadium)}}|{{(Ununseptium)}}|{{(Ununtrium)}}|{{(Uranium)}}|{{(Vanadium)}}|{{(Xenon)}}|{{(Ytterbium)}}|{{(Yttrium)}}|{{(Zinc)}}|{{(Zirconium)}}|{{(WP:RM)}}|{{(Move2)}}|{{(TFAfooter)}}|{{(article)}}|{{(See also)}}|{{(ll)}}|{{(language link)}}|{{(ed)}}|{{(doctl)}}

With:

{{subst:$1}}

I haven't tested it but I will soon. — MATHWIZ2020 TALK | CONTRIBS 20:21, 29 December 2005 (UTC)

I just tried it. First, you have to use the g and i flags. Second of all, you can't use $1. $1 only words with the first subexpression, $2 with the second, etc. Is there someway to do an arbitrary "$x", where x changes? — MATHWIZ2020 TALK | CONTRIBS 20:22, 29 December 2005 (UTC)
Dont worry about it, I have already made it do substing, just not on the release version, as there isnt much need for it yet, and I havent tested it completely. Martin 21:01, 29 December 2005 (UTC)

Interwiki link sorting

Hello. I see you're been sorting interwiki links by their language codes. That's not good, because English Wikipedia uses different sorting order where links are sorted alphabetically, based on local language (for the correct order, see this page, second option).--Jyril 21:12, 29 December 2005 (UTC)

The whole reason that page exists is to show the different options. There has been no agreement that we should use any particular one of them, and it has been discussed somewhere. Why do you say that we should use the second option?
In fact, if you follow the link in the See also section on that page to Wikipedia:Language order poll, you will see that there has been no consensus on English Wikipedia, and that the second option (of the Meta page, the 5th option on the language order poll) is not the most preferred one. Gene Nygaard 21:53, 29 December 2005 (UTC)
I have no idea about this issue. I don't mind either. I will accept whatever you guys think is best. Bobblewik 22:06, 29 December 2005 (UTC)
Alphabetical was the most popular choice, and at the moment most pages are just in random order, so any order is better. Martin 22:19, 29 December 2005 (UTC)
But by two letter code—that's what was most popular—and that's the way Bobblewik was doing it, not the way Jyril was telling him he was supposed to be doing it. Gene Nygaard 22:47, 29 December 2005 (UTC)
I just say that the two letter code sort is really awkward if your language happens to be in a completely wrong place. I thought that there was an agreement about the issue, since most wikilink lists I've seen seem to follow the order I recommended. I checked a few articles, and saw that Bobblewik's edits changed language name sorting to two-letter code sort. If AWB did this, I beg that this feature is removed.--Jyril 00:36, 30 December 2005 (UTC)
Yes, but what if the language doesn't use the Roman alphabet? Seems to me that sorting on the 2-letter code is the only really practical solution.--SarekOfVulcan 00:39, 30 December 2005 (UTC)
Use the transliterated name (for example, Nihongo for ja:), or check the list. And if you're not sure, use 2-letter code and let bots to handle the sorting.--Jyril 00:55, 30 December 2005 (UTC)

When I coded it, I had to choose one order, so I just chose the order that was most popular, I know there is no consensus, but I don't follow the logic that this means I should use the second most popular choice. Martin 10:45, 30 December 2005 (UTC)

While this isn't the place to make this debate, I think it's better to say it here than in a poll that hasn't been updated in four months. If AWB works on other language wikis (I don't know -- AWB doesn't work on my computer), then I don't think there is any choice but to order by 2-letter code. Sometimes, I edit in the Japanese twiki; and, I don't want to have to remember that in Japanese English (or, eigo) comes after Italian (Itariago). (Japanese ordering starts out a i u e o and progresses on). I doubt very much that Martin wants to make AWB aware of all the different language ordering options either. The 2-letter code was developed for a reason, and I think putting things in a globally accepted abcdefg order is not a large sacrifice. (While the aiueo order is standard for Japanese syllables, they still order the roman letters in the same way as everyone else). Neier 11:11, 30 December 2005 (UTC)
(Just a clarification -- my comments above are about editing the source of an article. The different interwikis all sort the language links as they feel appropriate when they are displayed; and that is the way it should be.) Neier 11:14, 30 December 2005 (UTC)
Just-so-you-know, the software only works on en. thanks Martin 11:30, 30 December 2005 (UTC)

stub tag order

I note that AWB is sorting stub tags to tghe very end. I thought that WP:STUB recomended stub tags after all text, but before non-stub category links. DES (talk) 22:10, 29 December 2005 (UTC)

I have always thought they looked best at the very bottom, out of the actual article and further from the text. It's easy to change, but I dont see how it would be any better. thanks Martin 22:16, 29 December 2005 (UTC)
I belive the thought was that the tag was easier to edit/remove when appropriate if it came before category links. This would not change the display appearence in any way. DES (talk) 06:14, 31 December 2005 (UTC)

Minor option

There is an option to mark all edits as minor. I have set this option as true - however, every time I close and reopen AWB, it resets my options. Can you fix this by saving all options in a .dat file that is loaded upon opening the program? Thanks. — MATHWIZ2020 TALK | CONTRIBS 19:56, 30 December 2005 (UTC)

I have already made it so it can save settings, I just havent enabled it yet because its not completely finished. It won't use a .dat file though, a config file based on XML, just to keep you all at the forefront of technology! Martin 22:28, 30 December 2005 (UTC)

Login issues

A`problrem occured yesterday when my login cookie expired. Please increse the priority for re-checking the cookie more often, perhaps on every page edited. In the mean time I advise users to double check that they remain logged in -- the display will show the difference if you look. DES (talk) 22:13, 30 December 2005 (UTC)

Yeah I noticed that, and I have fixed it already, it checks every edit now. I'll upload it tomorrow (will be version 1.4), as I have got a couple of other things to do as well. thanks Martin 22:28, 30 December 2005 (UTC)

Headings and spacing

I've been asked twice now about a possible problem with AutoWikiBrowser's "general fixes" function. The AWB removes the leading and trailing spaces in headings, such as this one, but MediaWiki automatically generates the sections with one heading. I haven't seen it cause any problems, but Help:Editing has it that way too. Just so you know. Titoxd(?!? - help us) 22:24, 30 December 2005 (UTC)

Yeah, I have stopped that now, it still will for "see also" and "external links" sections, as it makes it easier to check for common problems in those headings, but hopefully I'll get round to making that better. thanks Martin 22:31, 30 December 2005 (UTC)
p.s. if you look at the edit screen of Help:Editing, ironically some of its on headings dont have spaces. It really doesnt matter though. thanks Martin 22:33, 30 December 2005 (UTC)


  • Can you provide a full lsit of the "general fixes" soemtime? I am reluctant to check that box without knowing what it will do in some detail. DES (talk) 22:35, 30 December 2005 (UTC)
Most of them are listed on the main page, however I'll update it with a few changes I have made, p.s. I have uploaded 1.4 now. Martin 22:46, 30 December 2005 (UTC)
this edit demonstrates all the functionality of the general fixes. Martin 23:33, 30 December 2005 (UTC)
Thanks. DES (talk) 06:13, 31 December 2005 (UTC)

Windows 98

Why has this been disabled in Windows 98? I used it before but now I can't. --Celestianpower háblame 13:21, 31 December 2005 (UTC)

98/me dont handle unicode fonts properly with this. sorry Martin 13:32, 31 December 2005 (UTC)
Can you not enable the old version for 98 users then? --Celestianpower háblame 13:45, 31 December 2005 (UTC)
No version worked properly with windows 98 Martin 13:54, 31 December 2005 (UTC)
Yes it did - I used it. --Celestianpower háblame 13:56, 31 December 2005 (UTC)
Not with arabic or hebrew fonts though. Martin 14:02, 31 December 2005 (UTC)
But I don't need to see hand arabic fonts... --Celestianpower háblame 15:01, 31 December 2005 (UTC)
But if you edit an article with any arabic fonts in it will screw them up. There is one thing I might try though that may fix it, I will let you know when its ready if that's ok. thanks Martin 16:00, 31 December 2005 (UTC)

Version 1.5.1.0 error

After downloading version 1.5, I opened the AWB and clicked on Help>About to make sure it was the right version. It said it was version 1.5.1.0, not 1.5. I closed the about window and then immediately got this error:

See the end of this message for details on invoking 
just-in-time (JIT) debugging instead of this dialog box.

************** Exception Text **************
System.NullReferenceException: Object reference not set to an instance of an object.
   at AutoWikiBrowser.AboutBox.okButton_Click(Object sender, EventArgs e)
   at System.Windows.Forms.Control.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ButtonBase.WndProc(Message& m)
   at System.Windows.Forms.Button.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)


************** Loaded Assemblies **************
mscorlib
    Assembly Version: 2.0.0.0
    Win32 Version: 2.0.50727.42 (RTM.050727-4200)
    CodeBase: file:///C:/WINDOWS/Microsoft.NET/Framework/v2.0.50727/mscorlib.dll
----------------------------------------
AutoWikiBrowser
    Assembly Version: 1.5.1.0
    Win32 Version: 1.5.1.0
    CodeBase: file:///C:/Documents%20and%20Settings/Jacob/Start%20Menu/Programs/Wikipedia/AutoWikiBrowser.exe
----------------------------------------
System.Windows.Forms
    Assembly Version: 2.0.0.0
    Win32 Version: 2.0.50727.42 (RTM.050727-4200)
    CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Windows.Forms/2.0.0.0__b77a5c561934e089/System.Windows.Forms.dll
----------------------------------------
System
    Assembly Version: 2.0.0.0
    Win32 Version: 2.0.50727.42 (RTM.050727-4200)
    CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System/2.0.0.0__b77a5c561934e089/System.dll
----------------------------------------
System.Drawing
    Assembly Version: 2.0.0.0
    Win32 Version: 2.0.50727.42 (RTM.050727-4200)
    CodeBase: file:///C:/WINDOWS/assembly/GAC_MSIL/System.Drawing/2.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll
----------------------------------------

************** JIT Debugging **************
To enable just-in-time (JIT) debugging, the .config file for this
application or computer (machine.config) must have the
jitDebugging value set in the system.windows.forms section.
The application must also be compiled with debugging
enabled.

For example:

<configuration>
    <system.windows.forms jitDebugging="true" />
</configuration>

When JIT debugging is enabled, any unhandled exception
will be sent to the JIT debugger registered on the computer
rather than be handled by this dialog box.

What happened? Thanks. — MATHWIZ2020 TALK | CONTRIBS 19:59, 1 January 2006 (UTC)


hmmm, can you download the one I have just uploaded, 1.5.2 and see if it produces the same problem, thanks. Martin 20:15, 1 January 2006 (UTC)
Thanks - 1.5.2 works. What's the difference, though, between 1.5.0, 1.5.1, and 1.5.2? The latter two are not listed under the list the changes on WP:AWB. — MATHWIZ2020 TALK | CONTRIBS 22:50, 1 January 2006 (UTC)
I added the "find" button and textbox to search for text in the edit box. thanks Martin 23:00, 1 January 2006 (UTC)

New lines

Please look at the following edit: http://en.wikipedia.org/w/index.php?title=Bao%27an_%28Shaanxi%29&diff=33502758&oldid=33500193

The user complained about new lines. I do not know whether new lines are good or bad, but should I worry? Bobblewik 21:08, 1 January 2006 (UTC)

Well 99.9% of articles have spaces between the text and cats, but if they want to do it their own way.... Martin 21:14, 1 January 2006 (UTC)
I prefer not to have then Rich Farmbrough. 00:11, 2 January 2006 (UTC)
Well, the de facto standard is to seperate the cats from the text. Martin 00:22, 2 January 2006 (UTC)

Language links

There seems to be some confusin about the order of lang links. ja is ususlaly between nl and no. Rich Farmbrough. 00:11, 2 January 2006 (UTC)

It puts them in alphabetical order, as this was the most popular choice at the Wikipedia:Language order poll. Martin 00:21, 2 January 2006 (UTC)

suggestion - proper Unicode conversion

Would it be possible to allow AWB to convert Unicode characters from their HTML codes to their proper symbols? See Curpsbot-unicodify if you're not sure what I'm talking about. --Ixfd64 01:16, 2 January 2006 (UTC)

I would love to do that but I wouldnt really know how, maybe someone has some idea of how to go about this? Martin 01:18, 2 January 2006 (UTC)
I noticed that Unicode conversion appears to be possible with the find-and-replace function. However, the feature only converts one string at once. Would a possible solution be to add the ability to do multiple conversions? It will still be very tedious to convert every Unicode symbol, but we could save common conversions in the find-and-replace list. --Ixfd64 01:35, 2 January 2006 (UTC)
If we had a list of all the html and unicode symbols, we could make a simple list of find and replace routines, it wouldnt be pretty, but it would work and be reliable. p.s. I just put on note saying that I am perfectly willing to share the source with anyone who wants to help develop it ; ) 01:41, 2 January 2006 (UTC)

AWB text reformatting clutters diffs

Hi there AWB developers. I've noticed that AWB-assisted edits have a habit of reformatting wikitext in addition to the noted changes in the edit summaries. It would be nice if the reformatting were performed as a separate edit before the intended change (with an edit summary like "reformatting wikitext"). This would lead to much clearer diffs and a better reflection of what was actually done to the article. None of this applies, of course, if this behavior has changed in more recent versions or if what I have seen is a result of the AWB user's actions and not the software itself. Mike Dillon 16:34, 5 January 2006 (UTC)

The edit summary is always "AWB-assisted" followed by a phrase of the user's choice. Therefore, it is the user who wrote the incorrect edit summary, not the AWB itself. — MATHWIZ2020 TALK | CONTRIBS 22:48, 5 January 2006 (UTC)
How is it "incorrect" if the user doesn't know it's happening or doing it intentionally? I don't have access to AWB since I have no Windows machine to run it, so I don't know if they see a diff before saving or if they have to request it just like on the primary web interface. Does a user really know that AWB realphabetized the categories? Is that an automatic behavior, or did the editor I observed do this intentionally and neglect to note it? Not to sound like I'm on a witch hunt or something, as the diff issue is a pretty minor inconvenience. I would say that AWB should strive toward being neutral on the original wikitext formatting, as far as possible. Reformatting is an excellent functionality to expose to the user by choice, but not if they aren't aware of it. Mike Dillon 03:59, 6 January 2006 (UTC)
Diff by default, yes.--SarekOfVulcan 04:10, 6 January 2006 (UTC)
The user is 100% aware of all changes before saving, and can they can introduce any extra changes they want. Martin 09:48, 6 January 2006 (UTC)
It has been tweaked a bit more recently as well, so doesn't make quite as many changes. Martin 23:36, 5 January 2006 (UTC)
How? I know it doesn't fix dates anymore, but what other features have been removed? — MATHWIZ2020 TALK | CONTRIBS 23:49, 5 January 2006 (UTC)
I tweaked it so it didnt need to remove spaces before and after == which it did before to make other fixes easier. Martin 00:11, 6 January 2006 (UTC)
Good to hear. Mike Dillon 03:59, 6 January 2006 (UTC)
I noticed that tweak when I was reviewing the code for the first time, but I didn't know that was new recently. In addition, when I was reviewing the code, you seemed to use * and ? differently than listed at Regex. The article says ? matches 0 or 1 recurrences of the character, and * 0 or more, but somewhere, you used a *?. This leads me to believe that, in C#, * means 1 or more, the equivalent of + in most systems. Is this correct? — MATHWIZ2020 TALK | CONTRIBS 20:48, 6 January 2006 (UTC)
The *? is a single regex atom. It means 0 or more, but it says to use stingy matching instead of the default greedy matching of *. Unfortunately, the Regex article doesn't address greediness, but basically, a greedy regex will match as match characters as possible until it fails, while a stingy regex will match only until the atom that follows can match. There is a better explanation of greediness in Chapter 4 of the canonical Mastering Regular Expressions (search for "greedy" in the text). Mike Dillon 03:23, 7 January 2006 (UTC)
P.S. Search for "laziness" and "non-greedy" to get the explanation of lazy/stingy regexes, or better yet, read the whole thing ;) Mike Dillon 03:31, 7 January 2006 (UTC)
Thanks for the link Mike! I need to read up on Regexs. Martin 11:49, 7 January 2006 (UTC)

another thing

Another thing that should be added is the ability to change categories with a modifier after them, for example {{Category:Wikipedians in the United States|Jtkiefer}} The way AWB currently handles them if you tried to change them over to say {{Category:Wikipedians}} or {{Category:Wikipedians|Jtkiefer}} I'd end up with something like {{category|Jtkiefer}} which causes problems and which is an unusable category. JtkieferT | C | @ ---- 23:41, 5 January 2006 (UTC)

I will work on that with Martin. Thanks for notifying me of the problem! — MATHWIZ2020 TALK | CONTRIBS 23:49, 5 January 2006 (UTC)
It doesnt have any logic to remove keys, but otherwise it handles that fine, see this. thanks Martin 23:52, 5 January 2006 (UTC)
Thanks for the sandbox demonstration, but what's a "key"? — MATHWIZ2020 TALK | CONTRIBS 00:04, 6 January 2006 (UTC)
The key is the bit after the pipe " | ", if a category has a key it is sorted alphabetically by its key and not by its name. Martin 00:13, 6 January 2006 (UTC)
Oh - I always just referred to that as the modified page name. — MATHWIZ2020 TALK | CONTRIBS 00:24, 6 January 2006 (UTC)

Open Source

Due to the recent success of Firefox, OpenOffice, and other open source programs, I was wondering what the general consensus would be on making the AWB open source. I could release the source code and then make a page, maybe Wikipedia talk:AutoWikiBrowser/Open source, where anyone could request features, and developers could post code. If I implemented such a plan, I would also make available to developers an extensive list of the changes in each version of the AWB. Any ideas? — MATHWIZ2020 TALK | CONTRIBS 23:49, 5 January 2006 (UTC)

At the moment we make people register to avoid anyone abusing the software, being completely open would make that impossible, any features can be requested here, it would be cool to have more people developing it though. Martin 23:52, 5 January 2006 (UTC)
Okay, I understand. It would make the code more susceptible to abuse. — MATHWIZ2020 TALK | CONTRIBS 00:04, 6 January 2006 (UTC)
Abuse how? It's software for editing a wiki. Hardly revolutionary. If anyone really wanted to make their own then they could do so. This thing should be open source as should anything to do with the wiki (MediaWiki, etc).
It would make the job of vandals easier (they could lots of damage relatively quickly). BrokenSegue 17:40, 29 January 2006 (UTC)
Hmm. Just a very rough idea: would it be possible to keep a core part closed and make the rest open source? That core part would be released in binary as we have now and the rest would be open. After all, it's .NET :-) --Adrian Buehlmann 22:29, 29 January 2006 (UTC)
Of course it's up to the authors, but my vote is to open up the source.. security through obscurity really isn't that effective, and it's not that hard for someone to write a vandalbot on their own. It is also not that difficult to decompile and modify MSIL code, so someone with some .NET knowledge could bypass the registration check. Great software by the way, I just used it in order to subst a template which has been deleted.. now I just need to re-learn regexes, it's been a while since I've used them. Rhobite 20:14, 12 February 2006 (UTC)
The point is that this software would make high speed vandalism available to all, yes it is obviously possible that someone with a lot of spare time and decent programming skills could make their own vandal bot, but it is unrealistic. Also, consider that 1 person has already been removed form the check page for being reckless with it. Plus I am - prepare to be shocked - not a big fan of open source software in general, and the system that we have at the moment has been very successful in rapidly developing the program. Martin 21:35, 12 February 2006 (UTC)
As I said, it's your call. Thanks for writing it BTW, it seems very helpful. Rhobite 22:16, 12 February 2006 (UTC)
Not a fan of open source? The Washington Post had an interesting article today, which basically stated that flaws in open source software are fixed 60% more quickly on average than those in closed source software. (With the AWB, all fixes are quick. The article was about profession software, for example, Firefox, all of Microsoft's programs, etc.) Anyways, if anyone wants the closed source code, all you have to do is ask Martin and he'll happily give it to you. Unless you become some crazed lunatic vandal, of course. The only reason it's not open source is because there are crazed lunatic vandals out there, and keeping the code confined to people whom Martin trusts prevents them from getting their hands on it. E-mail Martin and ask for him to e-mail you the code, and then just e-mail him the code back if you make any changes. It's as simple as that. Martin's always quick to respond to his e-mails (no matter what the time zone discrepancy may be). --M@thwiz2020 22:33, 12 February 2006 (UTC)

Feature requests

I have two requests. Every now and then I notice the alert about a long article having stub status. Could there be a button (or something) that would quickly remove all the stub templates. I hate scrolling through the text and finding it.

Secondly, this tool is works wonderfully with fixing typos. I imagine it could do the same for disambiguating pages, but I'm really not sure how it would work. The idea I have is somehow a the program receives the different terms from the user (which he collected from the disambiguation page). After getting the list of pages linking to the DAB page, the user goes through each one. If it's the first term (say, pop music), he clicks on it (or maybe presses a hotkey) and the link changes to that term. Say pop -> pop. Clicking on option two (pop art) results in changing pop -> pop. I hope I've explained myself alright, it's difficult to describe. Let me know if you have questions or any suggestions. Oh, and welcome back! :) Gflores Talk 07:48, 6 January 2006 (UTC)

Tools for disambiguation is a really good idea, at the moment I am working on scanning the database dump, but this will probably be my next target. (that and introducing an spell checker, but I am waiting on Microsoft for that). Martin 13:17, 6 January 2006 (UTC)
I would like to point out this python bot which does something similar to what I described. [7]. I think if AWB can make it a bit more user friendly than using the python bot, that would be fantastic. Gflores Talk 05:56, 9 January 2006 (UTC)
A similar process could be used to sort stubs and categories. Just something to keep in mind. :) Gflores Talk 17:15, 10 January 2006 (UTC)
If you use navigation popups, you can access a similar feature via the popup. If you hover over a link to a disambig page, the bottom of the popup lists all the links on the page. Clicking on one replaces the link Pop with, e.g., Pop (it adds the correct page while keeping the text seen the same). — MATHWIZ2020 TALK | CONTRIBS 20:48, 6 January 2006 (UTC)

Minor regex request: I'm looking for a regular expression to fix bad links. Essentially, it needs to look for links in this form... [[http://www.abc.com]] and change it to this [http://www.abc.com]. Same with [[http://www.abc.com link]]. Sometimes, linke are like this [[http://www.abc.com|link]]. This needs to be changed accordingly to [http://www.abc.com link]. Any help is appreciated. I read a little about regex and came up have used this in AWB... \[\[([Hh]ttp:[^\]\]]+)]] However, it doesn't change for the later caveat (the '|') and may find false positives. If you have time. Thanks. Gflores Talk 18:04, 6 January 2006 (UTC)

That can be completed with some regex. I'll work on the code.  — MATHWIZ2020 TALK | CONTRIBS 20:48, 6 January 2006 (UTC)
I just wanted to say thanks for working on this item specifically. Currently the bad link cleanup process tasks many hours for each dump and this would speed up the task considerably. --PS2pcGAMER (talk) 22:30, 6 January 2006 (UTC)
Note to Martin - try:
replace \\[\\[http:\\/\\/(.*)\\]\\] with [http://$1]
replace \\[http:\\/\\/(.*)\\|(.*)\\] with [http://$1 $2]
This removes the double [ from http links, and changes the pipe to a space. It finds all links beginning with http://, which means it will also do this to links to articles such as [[http://]]. In addition, it will not fix links beginning with hTTP - I did this since, if you try [8], Wikipedia does not recognize it as a link. Wikipedia only recognizes external links that begin with an all-lowercase http, but the regex could be easily tweaked to fix any case. — MATHWIZ2020 TALK | CONTRIBS 17:59, 7 January 2006 (UTC)
Martin - I just tried the above regex. I added it between lines 58 and 60 in Parsers.cs, and it works. — MATHWIZ2020 TALK | CONTRIBS 19:18, 7 January 2006 (UTC)
Another note - you have to have the two regex replaces listed in the order above. For example, if you have [[9]] and do regex replace one and then two, you get Google - in the reverse order, you still have [10]. — MATHWIZ2020 TALK | CONTRIBS 22:57, 7 January 2006 (UTC)
Wikipedia:Bad links also has some bad characters for internal links. I have developed these regexs to fix them:
fixes double space: replace \\[\\[(.*)  (.*)\\]\\] with [[$1 $2]]
fixes space at beginning: replace \\[\\[ (.*)\\]\\] with [[$1]]
fixes space before "#": replace \\[\\[(.*) #(.*)\\]\\] with [[$1#$2]]
fixes double underscore: replace \\[\\[(.*)__(.*)\\]\\] with [[$1_$2]]
fixes underscore at beginning: replace \\[\\[_(.*)\\]\\] with [[$1]]
fixes underscore before "#": replace \\[\\[(.*)_#(.*)\\]\\] with [[$1#$2]]
I just tested them, putting them after the two lines above. The reason why I have separate regexs for spaces and underscores is because I don't want to change a link such as January__1#External_links to January 1#External_links - I want the link to use all spaces or all underscores. — MATHWIZ2020 TALK | CONTRIBS 23:06, 7 January 2006 (UTC)

Another request: according to Wikipedia:Manual of Style (headings), the sections should be in the following order at the end:

  • See also
  • Notes
  • References
  • External links

or

  • See also
  • References
  • Notes
  • External links

I know the AWB separates the categories, language links, FA templates, and Persondata templates and puts them in the correct order - could you do the same with the above sections, i.e., could you write code to separate them and then put them in the correct order? Thanks. — MATHWIZ2020 TALK | CONTRIBS 21:09, 7 January 2006 (UTC)

1.6.2

FYI: There is a 1.6.2 listed under the list of changes, but the check page doesn't show this version as enabled. — MATHWIZ2020 TALK | CONTRIBS

17:28, 7 January 2006 (UTC)

In addition, I was looking through the code of 1.6, and, in AboutBox.cs, on line 131, there is a type: guidlines should be guidelines. In AssemblyInfo.cs, the copyright date should be 2006 on line 13. Can I have the source for 1.6.2? — MATHWIZ2020 TALK | CONTRIBS 19:08, 7 January 2006 (UTC)

Sure, I'm busy at the moment, but I'll get all of the above sorted tomorrow evening, thanks for the regexs! Martin 22:21, 7 January 2006 (UTC)
I understand - we're all busy at some time or another. Thanks for all your work on the AWB, and especially for returning! — MATHWIZ2020 TALK | CONTRIBS 22:58, 7 January 2006 (UTC)

1.6.3

Note to all: a new version (1.6.3) is on its way! I have already added the following to it:

  • Heading sorter (temporarily removed until better script written)
  • Bad link repair (both internal and external)
  • Fixed security flaw script to see if user is logged in
  • Other minor changes (e.g., updated copyright year, typos)

I e-mailed the source to Martin, who will clean up some of the regexs for various tasks. The new version should be out today or tomorrow. — MATHWIZ2020 TALK | CONTRIBS 19:34, 8 January 2006 (UTC)

Notice

Martin - why'd you remove "When using this software, check every single edit and try to avoid making extremely minor edits such as adding or removing a single space" from the notice? — MATHWIZ2020 TALK | CONTRIBS 23:05, 8 January 2006 (UTC)

Because I put it in the "rules", it is more appropriate there. thanks Martin 23:26, 8 January 2006 (UTC)

Village pump (policy) archive

Martin says (above) that AWB "is not broken, it's not even specifically designed for this task as you seem to suggest."

We have no way of knowing the intent of the author that it's specifically designed for any particular task.

Unfortunately, the actual effect of AWB is to mass de-link dates, and to mass de-alphabetize inter-wiki links. For example, see breakage of Wikipedia:Disambiguation and breakage of Israel. That's just two very high profile examples.

Therefore, we should assume that it's misused because of poor quality control by the program author (regardless of intent), and prohibit futher use.

--William Allen Simpson 08:31, 1 January 2006 (UTC)
Huh? If you want to know the intent of the person who made those edits, since anyone using the AWB has to approve every edit they make, you could ask him (trust me, Ian is very kind and will respond to any inquries you have), and I don't see how the cited diffs could be interprated as "breaking" those pages.--Sean|Black 08:41, 1 January 2006 (UTC)
Thanks Sean. William, there is an option in the program that I was asked to implement that removes excess date links (I didnt even make the logic behind it), users have to conciously turn this option on for it to work. Plus every edit has has to be accepted by the user. The software can be used for a range of tasks, it is designed for no individual task in particular. Martin 11:12, 1 January 2006 (UTC)
P.s. I made the software, not Ian!) Martin 11:13, 1 January 2006 (UTC) whoops. Martin 15:28, 1 January 2006 (UTC)
Also, if you want to know what intentions were, examples include; Me stub sorting about 50% of "Artist" stubs in just a few hours, Kbdank71 (who pretty much single handidly takes care of WP:CFD) is now able to re-categorise the articles himself and User:Gflores has been correcting typos, many others have been using it as well for a variety of things. Martin 13:18, 1 January 2006 (UTC)
"Mass de-alphabetize inter-wiki links" is NOT an effect of the program. Spanish has the ISO code es. Esperanto has the ISO co eo. I see nothing about the behaviour of the program that indicates that it puts eo after es in a alphabet-based sort. The fact that you don't, or won't, understand how it is sorting things is another matter entirely. David Newton 20:53, 14 January 2006 (UTC)
I believe the confusion is that the original poster was referring to "alphabetization" of the resulting list of languages, not the underlying codes. The examples cited (Wikipedia:Disambiguation and Israel) used a sort based on alphabetization of the Latin alphabet transliteration of the local language names, but they were changed by an editor using AWB to be alphabetical based on the ISO-esque Interwiki language code. Since there is no official policy on interwiki link ordering, it is up to the editors of an article to agree on one scheme or another. They are both culturally biased, but the faux alphabetization of local names is not really any more internationally saavy because the sort order is still derived from Western languages. There is no inherent reason that the sound "A" should come before "Z" and that isn't even the case for all the languages that actually have those sounds. Mike Dillon 21:08, 14 January 2006 (UTC)

I beg to differ:

  1. AWB does not require every edit to be approved by the user. It makes dozens or hundreds of changes with only one approval.
  2. For those who have difficulty reading the diffs that I provided, concentrate on a few things:
    • removing (previously correct) date links for 1948 and 1967 (and many others) in Israel.
    • re-sorting (previously correct) interwiki links so that "Esperanto" is alphabetized before "Español" (in both diffs).
  3. There are many such bad edits, prompting a firestorm of complaints.
  4. If you made the software, you are responsible for its output, not the poor sap that used it assuming that the rules it followed conformed to consensus. There is no disclaimer of warranty in this venue.
  5. Finally, the argument concerning intent is pretty standard in the legal domain, and I'm sorry that's too esoteric for many to understand. Here's the short version:
    • The intent doesn't matter, and is not an element to prove.
    • The actual results are enough to indict.

Please cease and desist using AWB until its results are tested and proven to conform to consensus.

--William Allen Simpson 23:47, 1 January 2006 (UTC)
That's simply not true. The AWB does require all edits to be accepted by tose using it- Why would Martin lie about that? A couple of incorrectly alphabetized interwiki links do not "break" pages.--Sean|Black 23:53, 1 January 2006 (UTC)
A couple of dozen bad edits and damaged alphabetization on thousands of pages -- broken by any definition! --William Allen Simpson 11:35, 2 January 2006 (UTC)
(Thanks again Sean) William, you are simply wrong; every edit is checked by the user, if you have a problem with people removing dates then take it up with them, as long as it is a guidline it will be an option in the software. I dont know where you get these ideas from, but I hope you stop these slanderous accusations. Martin 00:00, 2 January 2006 (UTC)
Rather, the user is prompted to check every edit. This is a very different thing; you can't enforce an actual check in code. And if someone's making a change every two or three seconds, he's spending much more time waiting for the page to load than looking at what he's actually doing. I recall one article where an AWB-assisted edit changed "May 9th, 1955" into "May 9, 1955" [11], correctly fixing and linking the non-functional date, but incorrectly unlinking the year; it's hard to argue that this edit was sufficiently "checked".

My own bot has a function to convert old cut-and-pasted tables into invocations of Template:Album infobox. Because of all the crazy things folks have done with the formatting between the paste and my conversion, the function can't be 100% accurate, so I check every edit (in raw wikicode) before it's posted. Regardless, I still run it at the normal 30-second throttle, because the bot, not me, is doing most of the work. AWB should enforce a timeout as well. —Cryptic (talk) 03:46, 2 January 2006 (UTC)

True. But if someone is sloppily checking their edits, then it's the fault of the user who's, well, sloppily checking their edits, not the software.--Sean|Black 04:51, 2 January 2006 (UTC)
The edit you describe must have been done manually, it couldnt have suggested doing that. Martin 10:31, 2 January 2006 (UTC)
Plus, any user abusing the software should be removed from the list so they can no longer use it. I have also disabled the date removal thing because I am tired of defending what people do with it. Martin 10:42, 2 January 2006 (UTC)

Thank you for disabling the date removal "thing", please:

  1. disable the interwiki sort "thing"; and
  2. enforce a prompt for each and every change on a page, not a blanket acceptance of dozens of changes on the same page; and
  3. enforce a 30 second PER CHANGE timeout for speed of page edits!
  • Sorting interwiki links has been carefully and concientiously done by many international editors, and this one program damaged thousands of such pages, sorting by ISO code instead of alphabetically. That's broken by any definition of the term!
  • Moreover, there is no reason for us to have to go to each user of your software to chide them for making the mistakes programmed into the software. Many folks went to your talk page and the AWB talk page and complained, and you did nothing about it while thousands of pages were damaged! Now you (Martin) accuse us of slander?
  • It's apparent to those of us who have been both professional programmers and professional editors that your code was insufficiently tested, and did not conform to any standard of Professional Responsibility.

Please cease and desist using AWB until its results are tested and proven to conform to consensus.

--William Allen Simpson 11:35, 2 January 2006 (UTC)
It has been released as a development version. Plus, the only information I could find on inter language order was the Wikipedia:Language order poll, in which the alphabetical listing was the most popular choice, hence that's why it sorts like that. Martin 11:39, 2 January 2006 (UTC)
Will you (William) please stop going on about "blanket acceptance of lots of edits". It simply is not true. Every edit has to be accepted individually by the user and it is their fault if the edit is wrong because they didn't check it. Also, I think that sorting interwikis is very worthwhile because it makes it easier to add more in the future. --Celestianpower háblame 18:09, 10 January 2006 (UTC)

Incorrect edit

Just to let you know, on this page, NTL, it incorrectly moves the line to the bottom (it thinks it's a translation link? It begins with...

ntl:hell following shortly after. Devalued and struggling with debts of around $18bn NTL was forced to seek Chapter 11 bankruptcy protection in May 2002 in order to organise a refinancing deal. The company did not emerge from protection until January 2003,

I'm using 1.6.5 and doing a typo fix in some other paragraph in the article. Gflores Talk 16:32, 11 January 2006 (UTC)

Ok, fixed! (1.66 available now). thanks Martin 17:25, 11 January 2006 (UTC)

AutoWikiBrowser on Linux

The AutoWikiBrowser page does not answer: Are there any plans to port AutoWikiBrowser to compile on Linux? (There's the Mono C# compiler and the MonoDevelop IDE available, though this page is grim about how well WinForms apps work on Linux. It sounds like Mono's WinForms support isn't up to the same quality level as its other toolkits, like Gtk# and Qt#. --Unforgettableid | talk to me 19:44, 13 January 2006 (UTC)

I dont know much about mono, but I highly doubt it will be portable as it uses the internet explorer core, if they manage to port that then maybe. Martin 19:48, 13 January 2006 (UTC)
Could it be ported to the Gecko engine for rendering pages, instead of the IE one? --Cyclopia 13:35, 30 January 2006 (UTC)
I couldnt say if it is actually possible, though it would definately be difficult, particularly as ultimately I am going to use .NET Framework 3.0 technology. Martin 16:42, 30 January 2006 (UTC)
Since there is a portable, free (as in beer & freedom) C#/.NET runtime (Mono), and since there are portable and free C#-graphic bindings (Gtk#, Qt#), and since there are portable and free HTML rendering engines (Gecko), shouldn't it be better to focus on them instead of going to use highly platform-restricted technologies? Even if you really only want to stay on Windows, using WinFX will make your application available only on Windows XP and Vista. I can't understand the reasons of this choice. Is there some compelling reason for this? Please note that a Linux or MacOS developer that restricts its applications to its favourite OS in the same way would get the same criticism for me. --Cyclopia 17:26, 30 January 2006 (UTC)
Well I am a c# developer who uses VS2005 for a start, plus this highly restricted platform is the one that the vast majority of people use. I have never used mono, and quite frankly while I have more pressing issues don't intend to. As for WinFX, it will allow me to easily implement spell checking, I will provide a non-winFX version as well. Martin 17:35, 30 January 2006 (UTC)
Ok. I find it quite sad, but you're the developer. I'm sorry if I bothered you too much, I just wanted to let you know that there are ways to implement what you want to do without restricting your users to a (non-free) platform (no matter how used it is). For example, GNU Aspell can probably help you on spell checking in a multiplatform, free way. Thank you anyway for the good project -I'm just unhappy of being unable to use it. --Cyclopia 22:26, 30 January 2006 (UTC)

"maybe alphabetize interwiki " clutters diffs (please disable option)

Interwiki bots, when adding new links, routinely sort the interwiki links. Thus, it's probably preferable to disable the option in AWB as it makes diffs more difficult to read. -- User:Docu

I disagree, it's a useful tool and diffs are meant to be informative as to what changes were made, on more complex changes they are almost never "easy" to read JtkieferT | C | @ this user is a candidate for the arbitration committee ---- 09:18, 15 January 2006 (UTC)
It already has an option to disable interwiki sorting anyway. Martin 09:40, 15 January 2006 (UTC)

Adding datestamps to new versions?

Hey there. Just a quick request. Is it possible to add a date to when each new version is/was released? This just makes it easier if one has been away and they wish to see the changes that have taken place since their last version.

For example:

Version Released Release notes
1.7.0 January 16, 2006 Allows for openings into other areas of space and time.
Okay. I did just that! (Well, except for the fact that I haven't yet come up with a way for the AWB to engage in hyperdimensional travel, but that's on it's way...) --M@thwiz2020 15:01, 16 January 2006 (UTC)
Thanks. It will make it easier for people (me) to see what has changed between versions if I've been away.--Dan (Talk)|@ 15:46, 16 January 2006 (UTC)

Auto AfD?

Would it be possible to make a feature that would help AfD artciles? When I'm going through the articles needing cleanup I find the need to AfD some of them, but I don't want to stop very long or switch browsers to do that. I'd imagine that it would (after pressing a button) subst in the template then take you to the next page where you could write the summary and finaly takes you to the daily page to subst in the deletion page. Broken S 15:12, 16 January 2006 (UTC)

I could probably do that. The AWB would open three hidden windows and just add the AFD template. --M@thwiz2020 15:21, 16 January 2006 (UTC)
I thought Martin was coding this? Broken S 15:31, 16 January 2006 (UTC)
He is. WP:AWB states, "The author is perfectly willing to share the source code with anyone who wants to help in development." When Martin left Wikipedia on 2 January, I asked him for the source code so I could take over. However, he came back to Wikipedia so now he is doing the majority of the coding. Every once in a while, though, I send him some code that he includes in the AWB. For example, if look at the versions table, I contributed to 1.6.3 and 1.6.5. --M@thwiz2020 15:40, 16 January 2006 (UTC)
Alright. I was just wondering. Broken S 15:42, 16 January 2006 (UTC)
Interesting idea, not sure how feasable though, in the mean time there is an option on the textbox context menu to open the page in your normal browser. thanks Martin 16:38, 16 January 2006 (UTC)

Disambiguation tools

I just started using AWB today to do some disambiguation link repair. I don't have a lot of suggestions ye but figured I would add them here as I think of them:

  • Allowing creation of a custom list of Edit Summary's that can be changed for each page without having to re-click start. I like to include what I actually disambuated in the edit summary so that a) someone who looks at the history does not have to do a diff to see what I did and b) so I can look at the work for a single disambiuation page and get a feel for how many links are going to which articles. I usually do something like disambiguation link repair (You can help!) United Provinces to Dutch Republic but with a diffrent target article depending on what I did. It woudl also be good if you coudl apply the text from the dropdown then edit it for the given instance to add for example two links. In anyevent the edit summary is I think a good first step for doing DAB work with AWB.
  • A lot harder to implement would be allowing the find and replace to specify multiple replace with strings which the user woudl be prompted for each instance. Dalf | Talk 04:38, 17 January 2006 (UTC)
  • Add a filter to the list like the one that removes links outside of the main namespace to remove or select links via redirects. Dalf | Talk 05:15, 17 January 2006 (UTC)

interwiki sort

Can somebody check: [12]. Does it reveal a bug in interwiki sort? Bobblewik 19:47, 18 January 2006 (UTC)

No, some people like to sort them one way, some another, thats what happens when there is no policy. AWB does it the most popular way, plus it has an option in the menu to disable it. thanks Martin 19:59, 18 January 2006 (UTC)
OK thanks. Bobblewik 20:15, 18 January 2006 (UTC)

AWB editing speed

Martin, one of the rules for the AWB is "Don't edit too fast." What do you consider "too fast"? I noticed that you routinely crank out edits at five or so per minute, yet User:Talrias blocked User:Bobblewik on 29 Dec for editing the same amount, saying it was not possible. Is that speed acceptable? Thanks. --M@thwiz2020 21:42, 18 January 2006 (UTC)

I would be interested in the answer to this. User:Talrias blocked me again today <sigh>. So if I go quiet again, either it has happened again or I have given up in disgust again. This problem is inherent in being a janitor.
If people with blocking powers are using speed as justification then artificial speed reduction might be deal with the symptom, if not the disease. I do not know how I would target a particular edit speed. Is it possible to add a speed-brake at an acceptable rate, or a rate that could be modified by negotiation?
Incidentally, I have been accused of being a bot in the past (e.g. on my talk page and elsewhere) because my manual edits are often fast. I think I have got up to 4 edits per minute.
In addition, a suggestion for increasing overall speed while keeping individual speed low might be to share tasks. bobblewik 22:01, 18 January 2006 (UTC)
"Too fast" depends on what you are doing, so some kind of throttle becomes useless, plus people doing disambig'ing etc. frequently edit just as fast. I havent been using my bot account because it is flagged and wont appear in recent changes, I probably should make a seperate bot account though. Bobblewik was blocked because he was making controversial edits frequently, not specifically because they were frequent. Martin 22:16, 18 January 2006 (UTC)
(originally a reply to bobblewik, now down here after EC) In deleting categories that are about to be deleted from articles, it is not uncommon for me to open 10 or 20 tabs so I can go up and down the line with my cuts. While I did a bunch of stuff this morning with little heed to my speed, I am not sure that awb made me any faster than my multitabbed method. It just made me more efficient (well, until I broke someone's complex italicizing on an article or two). In fact I just checked and here is my multitabs where I was doing 6 per minute, which is comparable to my awb speed this morning (on the first page or two of my most recent contributions). --Syrthiss 22:24, 18 January 2006 (UTC)
Thanks for pointing out the italics problem, will be fixed soon. Martin 22:42, 18 January 2006 (UTC)
I just looked at bobblewik's contribs, and all his edit summaries say "x percent' -> 'x %' in accordance with Manual of Style" even if they dont change the percentage at all. Is this a bug or something?
PS, Bobblewik, i would check your edits a bit more carefully if i were you, for edits like this. -- jeffthejiff (talk) 23:05, 18 January 2006 (UTC)
This is very annoying, we have a check page so this software is only used resposibly, please check your edits. Martin 23:14, 18 January 2006 (UTC)
Actually I am going to remove your name from the list Bobblewik, you have already received multiple complaints from just a few hours work. sorry Martin 23:17, 18 January 2006 (UTC)
Huh? I do not understand. jeffthejiff wanted me to check a particular edit. No matter how carefully I check that edit, I cannot see anything wrong with it. It removed a blank line. How is that controversial? It is bizarre to be criticised for edits that are actually inherent in AWB.
If the problem is that the summary implies my own edits and all that happens is AWB-inherent edits, then that is fine. I can easily incorporate that as a constraint. Alternatively, I can modify the wording to add "possibly". But, until now, I was not aware that it was a problem. I still don't quite see the big deal. What other complaints do you think are valid? Sigh. bobblewik 23:52, 18 January 2006 (UTC)
The problems are; you havent been checking edits properly, your edit summaries are misleading sometimes and you have made very minor edits (which the main page specifically says not to do). If you come across a page where the task your are performing isnt necessary (i.e. you are fixing %s, but the page contains none) and/or the only edit it is making is very minor, then just ignore it. Also, err on the side of caution. May I also recommend that you generate a list of pages that definately need something fixed, such as a common typo (I can do that for you from the data dump if you want), because that way you won't come accross too many pages that should be ignored. If you are happy with all that then I am happy for you to comtinue using the software. I am signing off now so someone else can re-add your name before I come back if they see fit. Martin 00:22, 19 January 2006 (UTC)
I'm not sure the verification is working though since for some odd reason when I forget to log in it still lets me edit using AWB so you'll probably want to recheck and fix if needed the verification code. JtkieferT | C | @ ---- 23:57, 18 January 2006 (UTC)
Bobblewik, on the main page, under "rules", it says: "Avoid making extremely minor edits such as adding or removing a single space." In the edit described above, that's exactly what you did. Therefore, I am in full agreement with Martin's decision to remove you from the enabled users list.
As for the flaw discovered by Jtkiefer, I'll experiment with that, too. Thanks for bringing it up! --M@thwiz2020 01:50, 19 January 2006 (UTC)
I see that. I read it a while back but it clearly did not sink in. That must be because I find it easier to remember things that seem rational and harder to remember things that do not. My simple view of the world has been than a small improvement is better than no improvement. Unless my memory is playing tricks on me, there is encouragement for editors to improve articles in any way they can. I do recall seeing instances whereby my chosen edit did not appear but I chose to go ahead on that basis. I could easily have turned off the 'general fix option'. Believe me, I wish I had. It would even have made my work faster. I still do not understand why small improvements are a bad thing but I won't forget the constraint now.
As far as misleading summaries are concerned, I usually attribute that term to a claim of Doing X and Y when it does A and B, or X and A. In my case, the summary said Doing X. Y. and the complaint was that it merely did X. Actually, it sometimes did X only, sometimes Y only, sometimes X+Y together. So perhaps it should have said Doing X and/or Y. Ironically, the very specific summary was added in response to a request for more detail by Talrias. I do not like being blocked by him so I took his request quite literally and probably gave too much detail.
A while back I was going to ask for an option of ignoring pages if my 'Find' string was unsuccessful but still apply the general fixes if it was successful. That would mean I could be sure all my edits are a targetted 'hit' and may also have the benefit of the general fixes. The current situation means I now have a very strong incentive to turn general fixes off for all edits whereas I have only a very weak incentive, if any, to keep general fixes on.
Now, enough of that negative stuff and onto the positive. My aim was to bring the many instances of 'x percent' and 'x per cent' into line with MoS guidance which says digits should be paired with '%'. I used Google to find them, but if there is a way to probe a database dump, that would be much better. Just to confirm my knowledge of the rules: small edits are bad; summaries should not exceed the actual edits. If I am still persona non-grata, then that would be a shame but there is not much I can do about it. bobblewik 02:45, 19 January 2006 (UTC)
Your tone has grown slightly caustic. The point is, all your edits have some cost. Server time, wasted time of people checking RC (shouldn't that edit have been marked 'minor'?), filling up the edit history, etc. Removing that line changed the article in no way. You're supposed to ignore trivial cases because they are trivial (and should be done in conjunction with worthwhile edits, not as their own entities). If you haven't the time to change the edit summary while editing you are going too fast (unregistered bots, even manual ones, are supposed to go slower than one edit every 30 sec) and I doubt you are actualy sufficiently reviewing the edits. I agree, you shouldn't be using the bot for a while. Broken S 04:20, 19 January 2006 (UTC)
It is difficult for anyone to detect tone in text. To the extent that 'tone' is apparent, it frequently looks worse than intended. So do not judge me please. I have no idea about server time or what RC is. All I know about is that articles have a lot of rubbish in them. I was trying to fix percentages in line with MoS guidance. I succeeded in that task and thousands of instances of percent are better for it. That seems to me to be a good thing, not a bad thing.
I was not tackling excess lines but it came up as part of general fixes and it seemed a reasonable suggestion to me. My limited understanding of computers is that extra code on a page is wasteful of something. I have already explained the reason for the same summary but if you missed it, the reason it remained as Doing X. Y. is because it was indeed doing X or doing Y, or both.
If you are determined to believe that I am a bad person, or that I am damaging Wikipedia, then there is not much I can do about your opinion. If you look for defects in people you will find them. If you look for merits in people you will find them. I am not your enemy. bobblewik 05:16, 19 January 2006 (UTC)
We're not being your enemy, just trying to help wikipedia and help you help wikipedia. Extra code on a page is wasteful to some extent, but because each and every edit is saved in the history, it will just use space on Wikipedia's servers anyway. No idea how much they've actually got, but it must be some huge amount. A larger problem is the server time though. By that we mean the time the Wikipedia servers take to make the page up and send it to you - something that is costly in such large quantities (as one of the most popular sites on the internet), and the main reason why Wikipedia asks for donations. And by RC, brokensegue meant Recent Changes, a page which lists all the recent changes made to wikipedia. People check it and might look at the edits only to find that a space has been added in a place that makes no difference to the actual article. Same goes for the Watchlist.
So essentially really minor edits that make no difference to the article should be avoided because they just waste space in history lists - the actual space saved in the code is negligible. Thanks for spending a lot of time improving wikipedia though. -- jeffthejiff (talk) 08:22, 19 January 2006 (UTC)

(indent) The point of the "general fixes" is simply that while you are fixing something such as a common typo, re-categorising or something, that it also makes other minor changes at the same time, as a rule of thumb, if a change doesnt actually affect the look of an article, then don't make it. Also, you will be pleased to hear that the software does have the feature you mention, the "ignore if doesnt contain" is very helpful as it will automatically skip any pages that do not contain what ever the problem is you are fixing. As for your rules; small edits are good, but not if the total edit is insignificant, edits can exceed the edit summary (but not by anything significant), this might sound strange to write, but the vast majority of edits ever made exceeds the summary in some way, just don't let them be misleading, (I think this problem will be solved naturally if you skip pages that dont contain the mistake you are fixing) thanks Martin 10:28, 19 January 2006 (UTC)

Thanks for explaining it, I understand better now. I am delighted to hear that you have added an 'ignore if doesn't contain' feature. I appreciate the time you take in this. I would be happy to fix more of the same percent problems with a modified edit summary and avoiding insignificant edits. bobblewik 13:04, 19 January 2006 (UTC)
Bobblewik, above, you said "I used Google to find them, but if there is a way to probe a database dump, that would be much better." Well, Martin recently developed a database dump search tool. Either you can download and run it or ask Martin to scan the dump for the regex "per ?cent". That way, your fixes would be targeted to articles which (as of the last dump) contained percent or per cent, not all articles returned by a Google search of percent. --M@thwiz2020 21:52, 19 January 2006 (UTC)

NoAutoBrowser tag?

Hi. Is there any way for telling the AWB not to change (by default) a given piece of text? In two different revisions, two editors have removed a lone underscore that was instead necessary [13][14] . Another editor has found the solution of replacing it with the HTML code "&#95", but I am not convinced that the AWB will not delete it again. Having some way for telling the AWB not to change a given piece of text (or, at least, making it alerting the user that that piece of text is not to be changed automatically) would solve the problem. Thanks. - Liberatore(T) 13:33, 19 January 2006 (UTC)

It does have a way of telling the editor what not to change, as it shows them exactly what it has done before it commits anything, unfortunately, as discussed above, Bobblewik has been less than thorough in checking, and for that reason is not able to use the software at the moment. Martin 13:40, 19 January 2006 (UTC)
For this particular change, AWB should know not to remove underscores from the right-hand side of the pipe. In other words, given [[A_B|C_D]], AWB should transform this to [[A B|C_D]]. HTH HAND —Phil | Talk 15:11, 20 January 2006 (UTC)
Sorry about this bug - I wrote the code for the bad link fixer. I'll relook at the regexs and then get back to you with an answer and, hopefully, a new regex. Thanks for bringing this to my attention! --M@thwiz2020 20:26, 20 January 2006 (UTC)
While I did write the bad link fixer, Martin added his own underscore fixer as mine had some flaws to it. I went over his code and I think I fixed it - I'm currently testing it out on ASCII. I'll post my results here and, if it works, send the code to Martin. --M@thwiz2020 18:51, 21 January 2006 (UTC)
What the script does is it takes every link and, if it does not contain ":" and it does not begin with "[[_" (my script lacked these components, which is why it wasn't used), then it replaces all "_" with " ". I tried to get it to split the link at "|" but I couldn't get it to work. So, I guess, there is no fix. (It also tries to make [[Bracket|<nowiki>[]]</nowiki> into [[Bracket|<nowiki>[[]]</nowiki>.) Editors will just be forced to double check every edit before saving - then again, since everyone is supposed to be doing that anyways, there is no problem here.
A valid underscore in a link is extremely rare. Martin 21:19, 21 January 2006 (UTC)
Just a side question: As Netoholic points out to me on my talk this seems to be a common case to be carefully considered at least in respect to template calls. Is there any consensus how to write template calls? Is it normal that calls to templates are written at random with underscores or spaces? --Adrian Buehlmann 22:06, 26 January 2006 (UTC)
Underscores are seen as spaces, so replacing them with spaces makes no difference to how templates etc. are displayed. They are removed simply because they are unnessecary mess. It does say on the project page not to make very minor edits for this reason. thanks Martin 22:17, 26 January 2006 (UTC)
Ok thanks. I had that "unnecessary mess" feeling too and thought that while I'm cycling through the calls of Infobox Film I could clean up that in the same run. I have clarified that on the project page that this also accounts under "extremly minor edit". Thanks. --Adrian Buehlmann 22:39, 26 January 2006 (UTC)
Several pages on standards and punctuation syntax infact:[15]. Maybe there should be a rule which searches if a link in the page in question leads to the underscore page. At least the odds of encountering such a problem again would be slashed. Although, any page which relies on underscores (any page which deals with game map topics springs to mind) will be an issue in the future. Being a statistical guy, right now there are 930,000 articles on Wikipedia, and there are probably at least 1000 pages which contain links which rely on underscores. Therefore, in an even edit spread, there's a 1/1000 probability that AWB edits a page incorrectly. Realistic odds are much longer but the 1/1000 is still too high to ignore.--Dan (Talk)|@ 21:51, 21 January 2006 (UTC)
Virually all those links to underscore do not actually contain an underscore, I'll make it so it ignores links that actually contain the word "underscore" though anyway. Plus no links rely on underscores, as they are seen as whitespace, a few links start with an underscore, but it ignores them already (such as _NSAKEY, but as you can see the underscore isn't actually part of the name for technical reasons). Plus AWB is not automatic, so it shouldnt matter anyway. Martin 22:38, 21 January 2006 (UTC)
Actually, I had already seen the discussion on the AWB in the village pump, and I agree that the user is responsible for all changes. I suggested that the AWB could alert the user to more careful than usual when changing some pieces of text, but I consider that a suggested improvement for the AWB, rather than a fix. Not having any idea of how long would that take to be implemented, however, I don't know if the effort of realizing this improvement can be worthy. - Liberatore(T) 11:43, 22 January 2006 (UTC)

no category bug

AWB fails to reconize {{1911}} as adding a categoy. I suspect it does not catch them from any template as that woudl be hard to do. Its still probbly notable as articles with only 1911 britinica categorys probly need more. Dalf | Talk 08:25, 20 January 2006 (UTC)

that's correct, it doesnt see the category inside a template, but the 1911 one deoesnt count as a proper category anyway. Martin 09:44, 20 January 2006 (UTC)
True enough. I was wondering though if you coudl add a feature to add a category to articles if they do not already have it. I was using AWB today to add Category:Cities in Romania to all of the cities listed in List of cities in Romania (alphabetical) and there did not seem to be an easy way to do it. Some of them needed other Categories added too so just pasting over and over did not work. FOr that task it looks like we can just at the category to Template:Romanian cities infobox but for the future it might be a worthwhile feature. Dalf | Talk 09:51, 20 January 2006 (UTC)

Multiple regex replaces in the same run?

I know I'm evil (and greedy and possibly stupid :-): Could AWB be extended so that mutilple regex replace actions could be entered and executed in the same run (a list of search/replace fields)?

Maybe I'm on a totally wrong track, so I try to explain what I want to do: The underlying problem I'm thinking about to solve is changing calls to templates that need renames in parameters. Specific example job I'm about to do: template:Web reference supports an old variant that used uppercase/lowercase in parameter names. I'm thinking about changing these to the new lowercase only parameters variant.

Reason for this (sorry for my verbosity): Web reference is currently a meta template and I'm trying to convert that to using the new CSS-Trick of Netoholic. I have problems to do that supporting both kinds of parameter sets, so I'm thinking about switching calls to the lowercase only parameters variant.

Sorry for nagging with this whole chain of reasoning and many thanks for any help and ideas in advance. And thanks again to Martin for providing this great tool. (And please don't bother to tell me if I'm asking too much or the wrong thing!). --Adrian Buehlmann 12:41, 20 January 2006 (UTC)

It's a good idea, I'll get around to it one day, part of the problem is simply organising the interface. Martin 21:21, 20 January 2006 (UTC)
Yuha! Great. Thanks a lot. Maybe you could consider a config file with a list of exchange rules for simplicity. A menu point to load the rules (for the more experienced users) or so — Just an idea to make it as easy for you as possible. UI programming can get laborous. --Adrian Buehlmann 21:34, 20 January 2006 (UTC)

Use with other MediaWikis

This is a great tool. Is it possible to configure it to work with other wikis, like other languages, or meta.wikimedia.org? Elonka 03:25, 23 January 2006 (UTC)

At the moment I am mainly working on making the code more robust, this will make using it on other sites easier, but then there are probably problems that I havent foreseen. Martin 09:51, 23 January 2006 (UTC)

Date delinking

I thought that the date delinking feature was removed. However, Bobblewik seems to be doing a lot of date delinking. For example, see this edit. Has it been removed, or has it been added back? --M@thwiz2020 02:21, 24 January 2006 (UTC)

You can certianly run the regex repalcements manually, without using the delink checkbox. This will be true as long as repalcemetns using regex notation are supported. DES (talk) 03:14, 24 January 2006 (UTC)
However it happened, it seems that the changes are still not being reviewed in the latest round of the war on date links. In the Google edit cited above, [[January]] [[18]], [[2006]] got stripped of the brackets, rather than fixing it for the date prefs to work (manually removing the two middle brackets is all that is required). The malformed Dec 22, [[2005]] was handled in the same way. I know that reviewing changes like this is tedious, but my opinion is that quality shouldn't be sacrificed just for speed. Neier 06:05, 24 January 2006 (UTC)
I agree entirely, but Bobblewiki is not using AWB. Martin 09:33, 24 January 2006 (UTC)
He's using a crafty bit of javascript User:Bobblewik/monobook.js/dates.js

AWB is not recognizing me

My name (Eagle 101 is in the list, but AWB tells me that I am not elegible to use it? What am I doing wrong. (yes I am logged in, on windows SP v2, and have broadband internet. What is wrong.

It was working when I was using v1.7.1 but v1.7.2 is giving me the error message. Thanks for any insight.Eagle (talk) (desk) 22:17, 24 January 2006 (UTC)
The user-checking script was changed a bit in 1.7.2. However, as far as I know, it was only changed to make sure the cookie is not empty. --M@thwiz2020 22:42, 24 January 2006 (UTC)
try 1.73, thanks Martin 23:35, 24 January 2006 (UTC)
Having just tried 1.73, it's not working for me either. :( --Dan (Talk)|@ 00:43, 25 January 2006 (UTC)
Perhaps it should be 1.7.3 instead??? --Dan (Talk)|@ 00:47, 25 January 2006 (UTC)
I think I have cracked it, try 1.74, problem is that I can't reproduce the problem because it works for me. Martin 01:03, 25 January 2006 (UTC)
That works for me now, thankyou.--Dan (Talk)|@ 01:39, 25 January 2006 (UTC)
That works for me also. I think the problem is fixed now. Thanks.Eagle (talk) (desk) 20:00, 26 January 2006 (UTC)

Problem with redirects

I used version 1.7.4.0 and had Czestochowa in the list. When "Bypass redirects" is checkmarked (default), AWB loops forever on that redirect ("Browser status is {Loading|Complete}). I had to click the ignore button (BTW a stop button might be a good idea? Don't know). Same happens on Nowy Sacz, Znin, Elblag and others (from what links here of list of Template:Infobox Poland). Low prio problem for me. Just wanted to report it. --Adrian Buehlmann 14:38, 25 January 2006 (UTC)

Its the Internet Explorer/unicode problem again, will be fixed in next release, thanks Martin 14:53, 25 January 2006 (UTC)

Fixed width font for text window?

This one is for the wish list: it would be nice if the text edit window (lower right window) of AWB could use a fixed width font (like courier or so?). I recently noticed how helpful this edit window really is (I was a bit reluctant to edit there until recently but it works just great). But this is just a "nice to have" one (not that important, no clue how nasty to implement). Thanks! --Adrian Buehlmann 15:42, 26 January 2006 (UTC)

Changing the font is very easy, I'll do that in next release, thanks Martin 15:46, 26 January 2006 (UTC)
Ahh. The new fixed width font in AWB 1.7.6 is great. Many thanks. --Adrian Buehlmann 00:43, 29 January 2006 (UTC)

Suggestion regarding images

Is there a way to have an option to turn off the loading of images? Some days, like today, I notice a huge delay after I click save, and it's mainly due to waiting for the images to load in the page. I wind up clicking Start the Process again to skip to the next article. --Kbdank71 20:52, 26 January 2006 (UTC)

I just turned off images in IE's preferences, which isn't a problem normally as I use Mozilla for RealBrowsing (tm). Since AWB basically uses IE, that stops the loading of images. --Syrthiss 20:56, 26 January 2006 (UTC)
I suppose I should have figured that out on my own. Thanks! --Kbdank71 21:39, 26 January 2006 (UTC)

not recognizing me being logged in

AWB isn't letting me do any work since it keeps saying that I'm not logged in even though I'm logging in and I can even check a special page in the browser (which get pulled dynamically) and get the fact that I'm logged in. I'm using 1.7.4.0 btw. JtkieferT | C | @ ---- 22:58, 26 January 2006 (UTC)

With the new version it still isn't working for me, it keeps saying that I'm not logged in. JtkieferT | C | @ ---- 23:49, 26 January 2006 (UTC)
Hmmm, do you have internet explorer 6? Martin 23:57, 26 January 2006 (UTC)
Yes, how did you change the way that logins are confirmed, I never had this problem before the changes to the way login checking was changed. JtkieferT | C | @ ---- 00:02, 27 January 2006 (UTC)

Bug with anchored links

The general cleanup function replaces underscores in link anchors with spaces. This is incorrect behavior; the part of the link after the '#' character should not be altered. Kelly Martin (talk) 06:26, 27 January 2006 (UTC)

Ok I'll look into it thanks Martin 09:37, 27 January 2006 (UTC)
I cannot see that it is a problem, the extra underscores are just extra clutter surely? As links work exactly the same with or without underscores. e.g. Wikipedia:Manual_of_Style#Directions_and_regions is exactly the same as Wikipedia:Manual of Style#Directions and regions, apart from the latter has no underscores. I am not unwilling to change the behaviour, I just need to clarify the problem before I fix it. thanks Martin 12:10, 27 January 2006 (UTC)
Hm, MediaWiki replaces the spaces after the # as well. So you're right. Nevermind. Kelly Martin (talk) 12:16, 27 January 2006 (UTC)
Ok cool. Martin 12:22, 27 January 2006 (UTC)
Rick Block found a possibly valid reason to have an underscore in a link: it behaves like a non-breaking space, but is a bit wider than a normal space (example Leopold_I compared to Leopold I. See also Wikipedia:Village pump (technical)#non-breaking space in links and this example. --Adrian Buehlmann 15:20, 2 February 2006 (UTC)

Not allowing me to do any work

I just registered and AWB isn't letting me do any work. It keeps telling me to log in, I keep logging in, and it keeps saying I'm not logged in. BTW, I've got version 1.7.5.0. Alr 15:55, 28 January 2006 (UTC)

In the IE window it creates at the top of the page, does it show you logged in up there? I was switching to my bot account earlier and thought all I had to do was open up IE and log in there, but AWB didn't have me showing in the upper panel. Even though it says 'don't click in the top panel', you can click to do stuff like log in and look at your watchlist. --Syrthiss 16:03, 28 January 2006 (UTC)
Yes, it does show me as being logged in. Alr 16:04, 28 January 2006 (UTC)
Hmm, then I don't know. Maybe someone else will have a suggestion. --Syrthiss 16:16, 28 January 2006 (UTC)
Do you usually browse wikipedia using Microsoft Internet Explorer? As far as I can tell, AWB relies on MSIE being configured correctly. --Netizen 16:36, 28 January 2006 (UTC)

I use Firefox and only go with IE when absolutely necessary. I did try logging in with IE and then with AWB in various combinations. Didn't work once. Alr 20:06, 28 January 2006 (UTC)

Sounds like the problem I'm currently having (see my thread above). JtkieferT | C | @ ---- 20:21, 28 January 2006 (UTC)
The newest version (1.76) may have solved the problem, if not then I can probably fix it another way. Martin 22:00, 28 January 2006 (UTC)

Seems to be working OK now. Thanks! Alr 23:20, 28 January 2006 (UTC)

Yeah, works for me too it seems. JtkieferT | C | @ ---- 09:24, 29 January 2006 (UTC)

Multiple wiki-links bug

Under "alerts" when it says multiple wiki-links, if you double click a repeated link, in will highlight that link in the text, but will not highlight the last two sqaure brackets: ]]. So for example if you double click on a link saying Spain, it will highlight the [[Spain part in the text box to the right, but not the final brackets. I hope I have made myself clear. Thanks, FireFoxT • 17:58, 28 January 2006

First off, thanks :)

Making it so I can remove categories is a big help. I was doing it by hand (with awb opening the articles for me).

There appears to be a bug with the category removal though... if the category is listed like [[Category:Child prodigies|*]] (with the asterisk) then it wants to wipe out everything after that as well (other cats, interwiki links). I've just been ignoring those articles where that happens and I'll do them by hand. --Syrthiss 04:07, 29 January 2006 (UTC)

Sorry, just confirmed that it will do it no matter what the cat construction syntax is in some cases. It just wanted to wipe out all the interwiki links in Jeremy Bentham when I was removing the child prodigy cat. --Syrthiss 04:09, 29 January 2006 (UTC)
Fixxed now in 1.77, thanks for finding that. Martin 09:15, 29 January 2006 (UTC)

Yet another bug

I'm not good at filing bug reports, so I'll just describe step by step how to reproduce this bug and leave the brilliant prose for featured articles... :-)

  1. Opened AWB, started process.
  2. While waiting for pages to load, opened MSN messenger, started to chat.
  3. At the same moment when page finished loading in AWB, a carriage return ("enter") would be sent to my chat window.
  4. Closed MSN, but then went to IRC using XChat. The same "carriage return" event happened, cutting sentences half-way.
  5. If pages finished loading in AWB while Firefox was in focus, the last tab I opened with middle click would re-open in a new tab.

The above happened enough times as to lead me to believe it wasn't just something I was doing wrong. It seems that AWB just wouldn't sit quiet in the background. I'm using the latest version of AWB (just downloaded it today) and I have no memory of this error happening before, so it might be interesting to have a look at this. Thanks. -- Rune Welsh | ταλκ 22:36, 29 January 2006 (UTC)

it has always been like that, I keep meaning to get around to changing it, I'll do it soon. Martin 22:51, 29 January 2006 (UTC)
Fixed it now, in verion 1.78. thanks Martin 00:13, 30 January 2006 (UTC)

Make list problem

AWB version 1.7.8 loops forever if I try to make list from category:if templates. --Adrian Buehlmann 08:45, 31 January 2006 (UTC)

Hmm. Think this has nothing to do with AWB. That category is really strange. Seems never ending presenting the same entries over and over again (next is always enabled though there seems no next). --Adrian Buehlmann 09:21, 31 January 2006 (UTC)
This happens if there are more than 200 entries in a category with the same sort key. Known bug in MediaWiki. Kelly Martin (talk) 19:33, 31 January 2006 (UTC)
Um. Thanks for the message. Good to know. --Adrian Buehlmann 21:21, 31 January 2006 (UTC)

Noinclude

Is it possible, if "Apply general fixes" is checked, to not move categories enclosed within noinclude tags to the end of the article? I've come across a few templates that someone had edited with AWB, and the category was moved outside of the tags, causing a number of articles that had that template to be miscategorized. --Kbdank71 19:01, 31 January 2006 (UTC)

Will be fixed next release, thanks. Martin 19:29, 31 January 2006 (UTC)
No, thank you.  :) --Kbdank71 20:59, 31 January 2006 (UTC)

Request

Can we get an option to populate the article work list from a specified user's contributions? Kelly Martin (talk) 19:35, 31 January 2006 (UTC)

Until such a featuee is built in, i think a copy from a contributions page to a text file would do the trick. DES (talk) 23:54, 31 January 2006 (UTC)
Sure. Martin 00:02, 1 February 2006 (UTC)

Request 2

Is it possible to put a timer under the save button? such that the timer would reset everytime the save button is pressed? It would be nice to have, if it does not take that long to program. Thanks!!!!!Eagle (talk) (desk) 22:39, 31 January 2006 (UTC)

I'm just curious, but what would be the function of such a timer? --M@thwiz2020 23:41, 31 January 2006 (UTC)
To avoid editing so rapidly that other users claim that a bot flag is required. I have seen several complaints that any edits made at a rate of more than 2 or 3 per minute by an accoutn without a bot flag using AWB should be consdered bot-edits. I don't agree with thsi position, but avoiding clamor may be worth while to some. DES (talk) 23:52, 31 January 2006 (UTC)
You have my intentions exactly, time goes reeeally slow when you are stub sorting, especially by regex. I want to make dead certian that I do not edit faster than 2 or 3 edits a minute. What I am doing is replacing ((stub-here)) with ((another related stub here)), using regex to narrow the field, and then double check that the regex did the right thing. More infomation on what I am doing can be found here Eagle (talk) (desk) 00:09, 1 February 2006 (UTC) (the posted time is out of order due to edit conflict)
It would be easy to add so I guess I may as well. Martin 00:06, 1 February 2006 (UTC)
Thanks00:10, 1 February 2006 (UTC)
Really? When I use the AWB (which I haven't, unfortunately, been doing as much recently), I edit at about five pages per minute. To me, as long as I check each edit before submitting, I can go this fast. Each page loads in about five to eight seconds, so I can then spend seven to ten seconds scrolling through the changes window. If I see anything suspicious, I'll stop and investigate - otherwise, I'll submit. I see no reason why this technique should be criticized - sure, it's a lot of edits per minute, but they are each checked! --M@thwiz2020 00:12, 1 February 2006 (UTC)
Are you sure?? I will do what you stated above. Thanks!! Because most of my time now is spent watching the clock on my computer not editing or checking...are you sure that is ok.. according to the rules..., oh just tell me if it is ok!!!Eagle DES (talk) 23:18, 1 February 2006 (UTC)(talk) (desk) 00:27, 1 February 2006 (UTC)
As I read the rules, if an edit is manually cheked it is ok no matter how quick, but if someone objects to particular edits, they have been known to denounce them as "bot-edits" if faster than 2-3 per minute. read WP:BOT where things are less clear then they might be.
Is it possible to make this an option and not a mandatory timer (if it goes in at all)? I can edit faster than 2-3 per minute just by using firefox. --Kbdank71 14:16, 1 February 2006 (UTC)
Check out 1.79, has all requested features. Timer is option in menu. thanks Martin 14:21, 1 February 2006 (UTC)

Overzealous sorting of categories

Some categories are in a particular order. However AWB goes through and sorts them into alphabetical order. The result is I constantly have to partially revert AWB changes made by users. Is there a way around this? Can new rules for category sorting be added? Jdorje 05:50, 1 February 2006 (UTC)

Alphabetisation of cats and interwikis can be turned off in the menu. Martin 09:28, 1 February 2006 (UTC)
Ah. Missed that too. How about setting this to "off" by default? --Adrian Buehlmann 09:37, 1 February 2006 (UTC)
I could do, but when pywiki bots changes categories they alpha sort them in a totally automatic fashion, so I dont understand how they get away with it but I don't. Martin 09:47, 1 February 2006 (UTC)
Could you possibly turn category sorting off by default? It does more harm than good. DR31 (talk) 14:27, 1 February 2006 (UTC)
Ok, but do you not mind that pywikibots do it in large quantities without even human supervision? Martin 14:57, 1 February 2006 (UTC)
I guess. But whatever can be done to avoid reverting the work on carefully sorted categories. DR31 (talk) 15:54, 1 February 2006 (UTC)
If I see a wikibot do automatic sorting, I would complain about that too. But so far I've only noticed human users doing it (though I might have missed it). Jdorje 16:40, 1 February 2006 (UTC)
Here's an example: [16]: (AWB assisted living people category -- I guess that's a person using it) and (Robot: Changing category United States soccer players)
This would be pointless, there is no reason why the specific order of the category listings matters since they're still classified the same way no matter what they're order. Bluemoose I suggest you leave this feature the way it currently is since there is no good reason for turning it off even if it is manually enableable. JtkieferT | C | @ ---- 19:51, 1 February 2006 (UTC)
It affects the order of categories as listed in the bottom of the article. These are in a specific order. Jdorje 19:56, 1 February 2006 (UTC)
But what I'm saying is that other than the asthetic tastes of a few there's no reason why they need to be done that way and there's no advantage to having them sorted that way, I will continue to use the category sort feature on every edit I make using AWB. JtkieferT | C | @ ---- 19:59, 1 February 2006 (UTC)
Your argument is circular. If there is no purpose to sorting them any way, why do you want to sort them at all? Obviously there *is* a purpose to sorting. It also helps editors to have a consistent sorting: for instance, tropical cyclones are categorized by basin, season, strength, and location, in that order. Keeping the order consistent lets editors see at a glance that there are the correct number of each category applied (1 basin, 1 season, 1 strength, 0 or more locations). Putting them into alphebetical order provides no additional benefit whatsoever. Jdorje 20:05, 1 February 2006 (UTC)
A consistant sorting would mean sorting every article the same way and since there is no consensus on which way is best alphabetically works as well as any. JtkieferT | C | @ ---- 20:16, 1 February 2006 (UTC)
The hurricane articles are sorted consistently. Of the 550+ articles, only a small fraction (those which have been changed by AWB or a bot) do not follow the same sorting rules. You are, of course, free to continue using the AWB sort feature - in most cases it is probably appropriate to do so. I am also free to keep fixing incorrect sortings, and to keep complaining. However I think it is not right for you to go out of your way to "fix" all tropical cyclone category sorting specificially, as you appear to be doing. Jdorje 20:19, 1 February 2006 (UTC)
Might I suggest, if this is an important issue, that you try creating a template which adds an article to these multiple categories simultaneously in the correct order: this will also aid in making sure the sort key is consistent. HTH HAND —Phil | Talk 10:25, 3 February 2006 (UTC)
On that topic... Martin, your google wikipedia make list option looks amazingly. JtkieferT | C | @ ---- 20:24, 1 February 2006 (UTC)

Redirect problem

Not sure if I am doing this correctly so I'll just describe what I did.

  • I wanted to fix all the double redirects to a page I just moved.
  • I generated a list using the what links here feature and told it to replace article name with newname.
  • When the program went to edit a redirect it followed the redirects to the page (the on I just moved) and did the search and replace. Oc course there was nothing to replace since the article didn't link to itself.

Is there a way to make it not follow redirects (or do what I'm trying to do)? Oh, by the way could you have it give out a warning if someone type in the category name as "Category:This category" because then it will look for "Category:Category:this category" which is almsot never what is intended. It's not that important though. BrokenSegue 20:50, 1 February 2006 (UTC)

Yup, uncheck the "bypass redirects" option in the "general" menu. thanks Martin 20:53, 1 February 2006 (UTC)
wow, alright I'm dumb. Thanks. BrokenSegue 20:57, 1 February 2006 (UTC)

download broken

I have an unenabled version but the download seems to be broken. Dalf | Talk 07:01, 2 February 2006 (UTC)

Hm. It works for me. Just downloaded the newest version. --Adrian Buehlmann 12:34, 2 February 2006 (UTC)
The download link also works for me. --PS2pcGAMER (talk) 13:57, 2 February 2006 (UTC)
It downloads ok but it appears to be corropted "noi files to extract" when I try and extract it (even though it shows the files in the manafest). Dalf | Talk 02:55, 3 February 2006 (UTC)

Option request

I wouldn't call this high up on the priority list, but I think it would be nice if there was an option to turn off the "List Complete" and the "No articles in list, you need to use the make list" messages. --Kbdank71 14:09, 2 February 2006 (UTC)

Why? Martin 14:18, 2 February 2006 (UTC)
It would be easy to put the messages in the status bar, I'll just do that instead. thanks Martin 14:34, 2 February 2006 (UTC)
The status bar would be fine, thanks. I was just thinking it would be two less things to click on (I'm extremely lazy). --Kbdank71 14:54, 2 February 2006 (UTC)

Not updating after manual edit

Changes I make in the editing pane are not being reflected when I click "Preview" or "Show changes" again, although they are passed through when I click "Save". This makes it all too easy to make mistakes, I fear: could someone fix it fast, please. HTH HAND —Phil | Talk 10:26, 3 February 2006 (UTC)

I introduced this bug recently by accident, it should have been fixed in 1.8, do you have this version? thanks Martin 16:32, 3 February 2006 (UTC)

Multiple find/replace patterns?

Would it be possible to let the user add several find/replace patterns instead of just one? Thanks, AxelBoldt 21:32, 3 February 2006 (UTC)

Yup. See Wikipedia talk:AutoWikiBrowser#Multiple regex replaces in the same run? above. Martin has already signaled support for this. --Adrian Buehlmann 21:37, 3 February 2006 (UTC)
Excellent. This would also be useful in doing a certain very routine type of stub-sorting task: where a stub category is being split into a number of sub-categories, where each each of those is already tagged by another stub-category (so the object is basically merger of (sets of) two existing patterns, e.g. {{writer-stub}} & {{US-bio-stub}} => {{US-writer-stub}}).
This is already doable in two passes, but that's obviously not ideal (more work, and even if set to auto-run, more server load and more RC spam). Another way would be if add/remove stub template support existed as with category at present (indeed, it's conceptually almost exactly the same thing, just inconveniently different in wiki-syntax).
I should say, suberb job on this tool BM. It's sweeping my watchlist like wildfire, and it's a great hope for many of those mountainous cleanup task backlogs. Alai 04:38, 4 February 2006 (UTC)
I have added multiple find and replace now (version 1.81). thanks Martin 01:11, 6 February 2006 (UTC)

Saving and Loading

Is there a reason we can't save or load our settings? Secifically our regex and comment field settings? This would be a nice feature. ThanksEagle (talk) (desk) 22:16, 4 February 2006 (UTC)

I'll get around to it one day. Martin 23:56, 5 February 2006 (UTC)

Bug?

When I run AWB on Wikipedia:AutoWikiBrowser, the preview shows that it is trying to empty out the contents of 2 <nowiki> sections (<nowiki>{{wikify}}</nowiki> and <nowiki>{{stub}}</nowiki>). Please point AWB at that page and take a look. --kingboyk 23:34, 5 February 2006 (UTC)

doesnt do that for me, what version are you using? Martin 23:39, 5 February 2006 (UTC)
1.8, freshly downloaded. IE6. At second attempt it didn't do it for me either :~ --kingboyk 00:02, 6 February 2006 (UTC)

It would be great if AWB could recognise a subst'd AFD tag, and not suggest moving the Pages for deletion category to the bottom. That would save some time for me, as a lot of my minor edits are tweaking of deletion-listed pages. Another time saver would be if it didn't suggest changes which are no more than the insertion of removal of line breaks. --kingboyk 00:06, 6 February 2006 (UTC)

The AWB adds and removes empty lines just to "clean up" the page. If the only edits are adding or removing one line, then don't save. --M@thwiz2020 21:00, 6 February 2006 (UTC)
Yes, of course :-) But if AWB kept a count of how many changes it has made, and how many were 'trivial', it could skip to the next page if those numbers are low and equal - thereby saving me the time it takes to review. --kingboyk 00:33, 7 February 2006 (UTC)
Same with subst'd CFD tags too, if possible. --Kbdank71 21:54, 6 February 2006 (UTC)
Ok, fixed this in 1.83. Martin 22:27, 6 February 2006 (UTC)
That was quick! Thanks. --kingboyk 00:36, 7 February 2006 (UTC)

HTML comments lost when sorting categories

It seems to be common usage to annotate an Oscar category with the name of the actor involved, in an HTML comment, and I assume this happens for other categories also. However AWB unhooks these comments and leaves them dangling in mid-air. Could the category-sorting code be fixed so as preserve these comments? HTH HAND —Phil | Talk 10:05, 6 February 2006 (UTC)

Request

It's nice to see the implementation of multiple find and replaces. Would it be possible to create multiline find and relplaces too? For example, when you set AWB to find a sub-heading and replace it with nothing, it normally leaves an extra unwanted gap (example). This could probably be fixed with a multiline find and replace, if it is possible to do so. FireFoxT • 20:57, 6 February 2006

Try this instead: check the regex option and then use \r\n to indicate "new line". --M@thwiz2020 20:59, 6 February 2006 (UTC)
An alternative: Bluemoose, program the AWB to replace multiple blank lines with one and remove blank lines after headings after finding and replacing. --M@thwiz2020 21:01, 6 February 2006 (UTC)
Ah ok. Thanks. FireFoxT • 21:16, 6 February 2006

AWB 1.8.1: inserts * before a URL

Possible minor bug: AWB 1.8.1 wanted to do this edit to Australian Electoral Commission (Apply general fixes:yes, Auto tag:yes). I believe I did not set such a replace (I was doing things like this). BTW: really great the new multi-regex replace! --Adrian Buehlmann 23:15, 6 February 2006 (UTC)

It's because it bullets external links on a new line after the ==External links== header. This was an unusual situation as the was inside a template and on a new line. Martin 23:34, 6 February 2006 (UTC)
You're right. That's a very arcane case. Seems probably not worth to bother much about. Never mind. --Adrian Buehlmann 23:40, 6 February 2006 (UTC)

Bypass redirects and Skip articles

AWB 1.8.1: If I have set "Bypass redirects" AWB does not edit the redirect page, it instead edits the target page of the redirect (superb so far). I have now set in "Skip articles" (on the "Set options tab") a regex expression in "Skip if doesn't contain". It seems to me that the skip expression is applied on the redirect page. I would rather expect that the skip expression is applied on the page that AWB edits, i.e. the target of the redirect if "Bypass redirects" is set. BTW an option that would "click ignore for me" if it is a null edit would be fine ("skip null edits"?). (Sorry for not using the newest version, I have just loaded a bunch of multi-regexes right now into a 1.8.1 instance of AWB and I am too lazy to reenter them into the newest version :-). And sorry for being so greedy on features. AWB is just such an wonderful thing :-) --Adrian Buehlmann 19:59, 7 February 2006 (UTC)

It does check the article it is redirected to to see if it should "ignore if does/doesnt contain", and this seems to be working fine as far as I can tell. I'll add add the skip null edits thing to list of things I will do. thanks Martin 20:28, 7 February 2006 (UTC)
You are right. Sorry for nagging. I'm an idiot. I was distracted by the short blink of the redirect page in the browser window. Thanks for picking up the "skip null edits". BTW do you have an example settings xml file somewhere (or a doc)? The version note for 1.82 says that AWB can load it (not yet save). --Adrian Buehlmann 21:45, 7 February 2006 (UTC)
No problem, I have just uploaded version 1.84, this can save the settings as well. Hopefully you will be able to keep the same settings.xml file for any future versions, meaning you wont need to keep re-entering your settings. thanks Martin 23:40, 7 February 2006 (UTC)
Ahhh. 1.84 is amazing. I have edited my regexes into the xml. Thanks! --Adrian Buehlmann 10:52, 8 February 2006 (UTC)

URL Unicode fonts

Um, me again (don't beat me :-). Hypogeum of Hal-Saflieni is a redirect to Hypogeum of Ħal-Saflieni. I believe my AWB 1.84 loops forever on this. --Adrian Buehlmann 13:28, 8 February 2006 (UTC)

It's that problem IE has with certain unicode fonts in the URL, I am fixing them as I find them, so it is useful to know which ones it has a problem with. I have also added the option to skip pages that it hasn't made an edit on. (version 1.85). thanks Martin 13:47, 8 February 2006 (UTC)
Oh, that's bad. This must be a real pain for developing. I had no clue about that font problem. Would it be helpful to start a list of these problem redirects somewhere? I could then add them there without noising up this page here. Or I could simply email them to you. Or we could just make a special section here. Just some ideas. Re 1.85: Many thanks, I'm hurrying to download... --Adrian Buehlmann 14:20, 8 February 2006 (UTC)
Oh, dear. The new "skip articles with no changes" works like a rocket. Incredible. So far for reducing my wiki editing.... BTW a pause button (or ESC key?) would be something nice to have (very very low prio this). I'm a bit ashamed for constantly asking new stuff and frankly baffled by your responsiveness. Many thanks. I'll try to shut up my chatter a bit now. --Adrian Buehlmann 14:42, 8 February 2006 (UTC)
Calm down, our kid! If it's skipping articles, then it isn't making edits. If you're blasting through your list at a rate of knots not changing anything, nothing is lost except your opportunity to make a cup of tea Tongue.png. HTH HAND —Phil | Talk 09:15, 9 February 2006 (UTC)
OH. Just to make this clear: "skip articles with no changes" is an extremely helpful feature to me. It's great to see AWB rush automatically to the stuff I want to change. The idea behind having a stop/pause button is that I do not want to go away from my computer while AWB is running. But sometimes, I do have to do so :-). --Adrian Buehlmann 09:48, 9 February 2006 (UTC)

Font problems on

  • thanks Adrian, btw if you want to reproduce the problem for yourself try copying and pasting http://en.wikipedia.org/w/index.php?title=Xagħra_Stone_Circle&action=edit into internet explorer, it doesnt load up properly. If you click the link it works, if you paste it into firefox it works. therefore i am certain it is simply a bug in ie. Martin 22:29, 9 February 2006 (UTC)

Bug when skipping unaltered articles

If you have "Skip articles with no changes" set, then the "Preview" button acts like "Ignore". I can only assume that the dingus which checks for "no changes" is failing to spot any changes because these are not displayed when you do a "Preview". I note that this feature is disabled when you set "Preview instead of diff".

Might I suggest that the "Skip articles with no changes" should only actually perform the skip when you first land on the article and thereafter be disabled until the next article? HTH HAND —Phil | Talk 17:37, 8 February 2006 (UTC)

OK thanks, I knew I would miss something. Martin 19:02, 8 February 2006 (UTC)
It also skips incorrectly if you suffer a failure loading the article up. This is not so helpful when you then have to go back and figure out which article you just skipped so you can try it again. HTH HAND —Phil | Talk 09:12, 9 February 2006 (UTC)

HTML entities bug

When making a list, article titles containing HTML entities such as an ampersand (&) appear in the list as the HTML code for them (&). Simple fix I guess... BigBlueFish 10:43, 9 February 2006 (UTC)

I never bothered fixing it because it works just the same either way, just looks a bit ugly. Martin 10:48, 9 February 2006 (UTC)

Enabling of different versions

Due to only the most recent version of the software being enabled, and being updated every few days, is it possible to enable multiple versions of the software, so that a particular download of the program lasts more than a few days, and then only require an update when a "major" update is made? The constant un-enabling of the different versions makes things quite difficult, as using AWB seems to require a new download practically every single time I use it.

A self-update feature in the program whereby it updates itself to the most recent version might make things less difficult as well.

SchuminWeb (Talk) 03:10, 10 February 2006 (UTC)

Not that I would want to speak for Martin, but you might consider that AWB is still a development version and as such we – as early users – help to develop it by using it. As such I think the process that Martin uses (release often, release early) fits quite well. This also includes the early deprecation of older versions (it is good to use and test the new versions as they come out, as this helps improve the software too). This helps to make AWB strong and keep the workload for Martin as low as possible. I understand that this is a bit ugly for the users (me included!), but given the fact that the installation of AWB is trivial (just copy it in a directory) and that the settings can now be saved/loaded, I think we do Martin (and thus ourselves and Wikipedia) a favor if we do take the time to download the newer version before letting it go over articles. A new release each day is not that bad. --Adrian Buehlmann 07:35, 10 February 2006 (UTC)
That's right, but I accept the original point and will leave older versions enabled for longer. thanks Martin 09:56, 10 February 2006 (UTC)

I don't mind that older versions are disabled, but I would prefer that the check occurred at the "Make list" point and not after the "Start the process" button press. I know I can save the results and reload them, but it would be easier to switch to the new version if I didn't have to repopulate the article list. That's a small annoyance that I can live with though. -- JLaTondre 15:17, 11 February 2006 (UTC)

Adding unnecessary bold tags?

See these edits made with the AWB (general fixes and autotag on, nothing else on): [17] [18] [19] [20]. Why did the AWB do this? --M@thwiz2020 19:48, 11 February 2006 (UTC)

It's because it bolds the first occurance of the title if it is near the very beginning of the article, other bold text does not occur at the beginning and the title is not bolded anywhere else in the article. The problems you point out are because I made a silly error in the code, I have uploaded 1.881 now, which fixes this. thanks Martin 19:57, 11 February 2006 (UTC)

Minor edit setting not working?

I just starting work on Wikipedia:Bad links with AWB. AWB is such a great tool for this project and saves quite a bit of time. However, for some reason, the setting "mark all edits as minor" isn't working. All my edits are being marked as major edits even though the setting is checked. I've also tried closing AWB and opening it again with no luck. Is this a bug or user error? PS2pcGAMER (talk) 22:34, 11 February 2006 (UTC)

Nevermind, it is working now. If you want to investigate this further, let me know. Otherwise I will just mark this as a goof by me. --PS2pcGAMER (talk) 22:38, 11 February 2006 (UTC)
I'll tweak it so it sets the checkbox on load as well, at the moment it only sets it on save, but I can't see anything wrong with it. Martin 22:50, 11 February 2006 (UTC)
That's probably why I was confused. Thanks! --PS2pcGAMER (talk) 23:03, 11 February 2006 (UTC)

Bad link repair

While going through Wikipedia:Bad links, I've been getting some weird results every 1/20 pages that I (luckily) catch in the "show diff" and fix manually. I'll review the code and let you know if I make any progress as to the cause of this. I can't give you a sample edit, though, since I fix them manually. --M@thwiz2020 00:40, 12 February 2006 (UTC)

See, for example, this edit, which I let go and then later fixed manually. --M@thwiz2020 00:43, 12 February 2006 (UTC)
Its because the regexes to fix them were being greedy, I've fixed it now. Martin 00:53, 12 February 2006 (UTC)
Thanks! I'm going to stop going through the bad links for the night pretty soon, so just upload it whenever you can, and I'll download it tomorrow. I have other work to do now (although whether or not I'll do it now is another question - I'm a bit of a procrastinator.) --M@thwiz2020 00:58, 12 February 2006 (UTC)
I've uploaded 1.89 now, with the above fixes and numerous others, including stopping links being clicked in the browser window. I probably did other stuff too, I'm just to tired to remember now. Martin 01:09, 12 February 2006 (UTC)
Thank you. I'm working on the D's with no problems so far! --M@thwiz2020 23:00, 12 February 2006 (UTC)

Auto mode

Does one need to have a bot flag on their account to use "bot mode" - I'm listed in the access as a bot. I have a 8000 entry list to do and I rather automate over clicking 8000 times :) Tawker 12:22, 12 February 2006 (UTC)

I couldn't say if you need a bot flag or not, but the auto mode is virtually untested at the moment, once it is reliable then I have no problem with you using it. Martin 12:31, 12 February 2006 (UTC)
From testing, it doesn't work if I have my bot username in the bots field, it still won't give me the field. Not sure if I want to use Auto on adding subst: anyways, I don't know if I'd trust the python bot to do it.) Tawker 23:41, 12 February 2006 (UTC)
Adding it to the bots isnt meant to enable the automode, When it is more thoroughly tested I will enable it for some users. Martin 23:48, 12 February 2006 (UTC)
Martin, just as you have the "enabledusersbeginshere" tag, you could have a "botslistbeginshere" tag that tells the AWB if a user is a bot. --M@thwiz2020 23:56, 12 February 2006 (UTC)

Bad link repair

Martin - can you add a preset edit summary to the drop down box such as: "bad link repair. You can help!"? Thanks. --M@thwiz2020 22:46, 12 February 2006 (UTC)