Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Essjay (talk | contribs) at 03:46, 26 July 2006 (→‎[[User:Wikipedia Signpost]]: Approved). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

New to bots on Wikipedia? Read these primers!

To run a bot on the English Wikipedia, you must first get it approved. Follow the instructions below to add a request. If you are not familiar with programming consider asking someone else to run a bot for you.

 Instructions for bot operators

Current requests for approvals

I would like to request a bot flag for the User:Wikipedia Signpost account. The account will be a backup to Ralbot (approved in May) when Ral315 is unavailable, except that it will only spam the people on the Wikipedia Signpost spamlist, using only AWB. This will be manually-assisted; in other words, while the "automatic" feature will be used in AWB, it will have to be started manually and be monitored at all times. Given that I can't program (well, excluding java, which isn't an ideal language for bots), the account will not and can not do much else. I am also aware that manually-assisted use of AWB doesn't require a bot flag; however, I would like to not flood RC with the nearly 200 users on the spamlist. The account will only run and spam the people on the list on distribution days (usually late Mondays or early Tuesdays) when Ral315 and Ralbot is unavailable, meaning that the account should be sparingly used. I don't anticipate a problem with using AWB for this purpose. Thanks! Flcelloguy (A note?) 23:58, 25 July 2006 (UTC)[reply]

Sounds fine..but if there are going to be many bots doing this, you might want to make a task page, to check off when distributing (shhh, can you hear the process creeping?). — xaosflux Talk 00:53, 26 July 2006 (UTC)[reply]
  • I have no objection. I have a python spambot lying around that I use for the Mediation Committee, if you'd rather have something like that than AWB. Otherwise, approved. Essjay (Talk) 03:46, 26 July 2006 (UTC)[reply]
  • What: The purpose of this bot is to do WikiProject maintenance for whatever WikiProject thinks its services are useful. WikiProject Anime and Manga will be the first beneficiary if this bot is approved. Of course, all its actions will have to be approved by any WikiProject before it is used on that WikiProject, as while in many cases they may be very useful, some WikiProjects may have different sorting systems that this bot should not mess with.
  • Why: I have noticed that WikiProjects run into a few problems in terms of raw sorting. First of all, there are often a huge number of stubs (hundreds or thousands) that belong in the category of a WikiProject (with the appropriate template in the discussion page) but have not been included. A bot could easily do this. There are other similar things, of course, that the bot could also do.
  • Exactly What:
    • Put all "anime and manga stubs" not in WikiProject Anime and Manga into Wikiproject Anime and Manga as Stub-Class.
    • If WikiProject Anime and Manga lists a stub as anything but stub-class, remove the stub tag as long as the article is more than 1000 characters long. If it is shorter, do nothing, as it could be an actual stub. The 1000-character cutoff isn't an arbitrary cutoff, but simply a safety feature to avoid a really short article that was accidentally listed as a non-stub in the article rating scale to stop being called a stub.
    • For all projects with their own rating system, turn all GA-class articles in those projects that aren't listed as GA or better into GA-class articles under that project (this will start with WikiProject Anime and Manga only).
  • It uses PyWikipedia Framework. It won't be a server hog because it will be run manually (i.e. only when I tell it to), so instead of patrolling recent changes for hours on end it will simply patrol the categories when told to, probably weekly or every few days.
  • There are thousands of articles here that need proper sorting, and a bot in my opinion is the best way to do it. In addition, if there is some sort of mistake (i.e. a B-Class article that's currently a stub that gets unlisted as a stub when it shouldn't be), it isn't devastating: a 99% success rate would add far more good than it would bad, and I highly doubt it would have any real failures, unless some fool ran around rating Stubs as B-Class articles. Dark Shikari 10:26, 18 July 2006 (UTC)[reply]
I don't see a problem, as long as it's being sponsored by a Wikiproject, and the exact details of what it is going to do are approved here in advance. We don't generally give a "whatever you need it to do" clearance; you tell us specifically what it will do, and we approve that. If you add something, you drop a note here saying what you're adding, and we say yea or nay. I see no problem why it can't have a trial period once it's set up and has a specific list of tasks. Essjay (Talk) 00:41, 19 July 2006 (UTC)[reply]
Its not really "sponsored" just yet: I'm asking what people think of it on the WikiProject talk page. So far I've received no disapproval--they've also made suggestions as to what I can and can't do in regards to sorting using the bot. You can find the discussion so far here. How much approval should I look to get before I set the bot into motion? Dark Shikari 01:09, 19 July 2006 (UTC)[reply]
Dear God no!!!! You don't seem to realise that stubs and Stub-Class articles are two completely different things! The terminology is admittedly extremely confusing (and the sooner something is done about it, the better). Also you clearly don't understand that length is only a minor consideration when it comes to working out what a stub is. An article swith one line of text followed by a list of 50 examples, or one line of text followed by a large table, is definitely a stub and likely well over 1000 characters. This is why stubs are sorted by hand, rather than by some automated method. Having the bot run in this way could well reverse much of the work of the Stub sorting wikiproject. Grutness...wha? 01:29, 19 July 2006 (UTC)[reply]
(Just for clarity) Hence why I said "as long as it's being sponsored by a Wikiproject, and the exact details of what it is going to do are approved here in advance." My take would be that the Wikiproject people would be quick enough to know what should and shouldn't be done, and that as long as specific changes are set out here before they are done, there is plenty of time for approval. Just saying that for the sake of clarity on my earlier comments. Essjay (Talk) 01:42, 19 July 2006 (UTC)[reply]

I have similar concerns to Grutness. The whole concept of a "stub class article" is poorly defined at best: at a deletion discussion on one such category, one WP1.0ist argued to keep them separate specifically on the grounds that "Stub class" is not the same as "stub"; another wanted to keep, on the basis that they were essentially the same. This needs to be much more clearly defined before a bot goes around making sweeping changes based on an assumption one way or the other. I see no evidence of support for this at the indicated Wikiproject, and the following comment sounds like opposition, or at least a reservation, to me: "It probably should go case-by-case (which I guess isn't what you want to hear for a bot, huh)." I'm especially opposed to stub-tag removal by bot; if a B-grade article is still tagged as a stub, there's clearly been a snafu someplace, since appropriate tagging and categorisation should be required to get it to that standard: much better to detect these by category intersection sorts of report generation, and have someone fix them manually. Stub-tagging might be more reasonable, but again, it's been specifically claimed by one person that some "Stub class" articles are not stubs, so this would require clarification and refinement; it would also have to be done very carefully to make sure that the "Stub class" tag, and the "stub" tag have the same scope by topic, and that no more specific tag applied instead. (For example, doing this to apply a single stub type to the whole of WPJ Albums, or WPJ Military History, would be a disaster.) Alai 04:06, 19 July 2006 (UTC)[reply]

I hope nobody minds my ill-informed opinion, but to incorporate Alai's suggestion into the bot sounds like a benefit. Without doing any editing, the bot could see how many articles have been snafu'd in such a manner and generate a report to somebody's userpage. It almost sounds like this should be a generic bot that Project leaders could download and configure for their specific purposes, or you could keep it proprietary and write all those little sub-bots yourself. ;) Xaxafrad 04:20, 19 July 2006 (UTC)[reply]
Well I had been informed by many that stub-class articles and stubs were in fact the same thing, and that anything above a stub-class article could not possibly be a stub. I had also been told that anything under the category "anime and manga stubs" most certainly belongs in Wikiproject Anime and Manga as a stub-class article. So who's right? I'm now getting confused... Dark Shikari 09:48, 19 July 2006 (UTC)[reply]
And in addition, wouldn't the most recent assessment of an article always be trusted? It seems as if most of the stub categorization was made months or even years ago, and hundreds of articles are still marked as stubs that have expanded far more since then, and have much higher ratings within the WikiProject. Are you saying that WikiProject ratings are totally useless and cannot be used to justify removing a stub tag? The WikiProject members would disagree. Dark Shikari 10:00, 19 July 2006 (UTC)[reply]
Oh, and also, is there anything wrong with simply having the bot shove all the Anime and Manga Stubs into the Wikiproject? There's a huge number that aren't part of the WikiProject, and nobody seems to have a problem with assigning them to the Project. Dark Shikari 13:03, 19 July 2006 (UTC)[reply]
If "Stub class article" and "stub" really are the same, then we're back with my original concerns, i.e., why have two sets of categories for the same thing, thereby duplicating work, and causing exactly the sort of inconsistency described? See for example this discussion, in which the matter is made... well, rather opaque, actually. One claim seems to be that a "long but useless" article would be "Stub class", but not a "stub" (though personally I would say that a textually long article can still be a stub, so even that's far from clear either way). It's possible one implies the other, but not vice versa, for example. No, I'm not saying assessment ratings are totally useless: are you saying stub tags are totally useless? Members of WP:WSS would disagree. An automated rule that simply assumes which of the two is incorrect is highly problematic, either way. An assessment that ignores the presence of a stub tag is either a) using different criteria for the two, or b) failing to leave the article in a state consistent with said assessment; a person has made them inconsistent, and a person (same or otherwise) should make a judgement as to how to make them consistent. People are stub-tagging things all the time, I see no reliable basis for assuming that the assessment rating is either more reliable, or more recent. I have no objection to the application of WPJ-tags by bot, if the associated WPJ has expressly agreed to the basis this is being done on, in their particular case. Alai 14:58, 19 July 2006 (UTC)[reply]

Thanks for the input, Alai. I've made a revised set of things the bot could do:

    • Put all "anime and manga stubs" not in WikiProject Anime and Manga into Wikiproject Anime and Manga as Stub-Class. There's no way for the bot to figure out if they deserve a higher rating than Stub-Class, so its fair to start them off there.
    • For all projects with their own rating system, turn all GA-class articles in those projects that aren't listed as GA or better into GA-class articles under that project (this will start with WikiProject Anime and Manga only). Do the same with FAC articles that aren't listed as FACs under the WikiProject system. Dark Shikari 15:15, 19 July 2006 (UTC)[reply]
      • That seems completely fine, no objections to those tasks. Obviously in each case consultation with the wikiproject, and confirmation with them of the scope of the articles they're "adopting" thereby would be indicated. Alai 02:59, 21 July 2006 (UTC)[reply]


Requests to add a task to an already-approved bot

Grafikbot, MILHIST article tagging

I would like Grafikbot (talk · contribs) to tag military history related articles with the {{WPMILHIST}} template for assessement purposes as defined in the WP:1.0 program.

A complete automatic tagging bot is still out of reach, so for the time being, a limited tagging will be executed as follows:

  • Every article with {{Mil-hist-stub}} stub tag and related tag (list available at WSS) as well as with {{mil-stub}} and below (list available at WSS) is considered as a military history article and thus subject to tagging.
  • The list from Wikipedia:WikiProject Military history/New articles will also be processed.
  • The talk page of each article is tagged with the {{WPMILHIST}} prepended to the talk page (even if the talk is empty).
  • The run is repeated say one or two times a month to make sure that new stubs get properly tagged.

Note: a rather lengthy debate took place on WP:AN a few weeks ago, and a consensus emerged that such a tagging was desirable for the whole WP project. Obviously, a bot can't tag everything, but I think it just can handle this one. :)

Can someone approve this please? :)

Thanks, Grafikm (AutoGRAF) 15:09, 21 July 2006 (UTC)[reply]

  • This somewhat touches on some of the same issues as discussed in relation to Dark Shikari Bot, but I see no problem as such. The scope does seem rather wide, though: {{mil-hist-stub}} obviously makes intuitive sense, but does the wikiproject really want to "adopt" the whole of {{mil-stub}} (i.e. corresponding to the whole of Category:Military)? Perhaps when you're approved for a trial run, you might start off with just the former, and consult somewhat on the latter. Alai 04:48, 23 July 2006 (UTC)[reply]
    • Well, aside from fictional stuff (which people really shouldn't be using {{mil-stub}} for anyways, I would think), we've already adopted basically all of Category:Military already. Kirill Lokshin 04:54, 23 July 2006 (UTC)[reply]
      • Then it's not very well-named, is it? (BTW, you rather said the opposite when a "Stub-Class articles" category was being discussed for deletion, that there was no single hierarchy to your WPJ's scope...) At any rate, if there's any scope whatsoever for "false positives", it's not the best place to start. Alai 05:10, 23 July 2006 (UTC)[reply]
        • Well, no; the project's scope is actually broader than merely what's in Category:Military ;-) As far as false positives, I don't know how best to handle that. (What's the danger of having a few extra articles tagged, though? These are only talk page tags, and tend to be removed from articles where they don't belong with minimal fuss.) Kirill Lokshin 05:14, 23 July 2006 (UTC)[reply]

For your collective information, I note that there are 11,216 articles in or under {{mil-stub}}. That even misses a few, since for some incomprehensible reason, aircraft are largely split by decade, rather than into military and non-. There's 3,374 that are specifically "military history". Let's be as sure as possible these are all within the Wikiproject's scope before getting carried away with this (especially the "non-historical" ones). Alai 07:40, 25 July 2006 (UTC)[reply]

  • Well, I am leaning against a bot. I have just been cleaning up some 75 articles in cartridge category that had the WPMILHIST banner on the talk page that I assume were done by a bot. If the article was read it clearly stated that it was used for sporting or hunting with no military referenced at all. Article should be read and assessed at the same time.--Oldwildbill 10:45, 25 July 2006 (UTC)[reply]
    • There are no tagging bot here that I'm aware of, at least not today. -- Grafikm (AutoGRAF) 17:00, 25 July 2006 (UTC)[reply]

Bots in a trial period

  • What: The purpose of this bot is to simply update the "Number of Articles Remaining" table on Category:Cleanup by month. The bot can be run manually, or scheduled to run automatically (it is currently not), but needs to run no more than once a day to keep the table updated to the scope that it is currently being done. The bot is a PHP 5.14 Command line interface, which operates on a Unix machine, and makes use of cURL and extensive regular expression parsing.
  • Why: As the data for this table is constantly changing, it is rather tedious for a human to manually update the data. Instead, this bot will retreive all the relevant information necessary, and update the section automatically. Given the fact that manual updates generally introduce errors (see update history), performing this task automatically is a better way.
  • How: The bot uses cURL/regex to pull and parse a minimal number of pages – the Cleanup by month category pages, plus Category:Cleanup by month, Category:Music cleanup by month, and Special:Statistics. An average run of the bot, which takes about three–four minutes, pulls less than one hundred pages. As well, if, at any time, a page is pulled incorrectly (usually a result of a timeout, the bot will abort, pulling no further pages and making no changes to the category page.

    On each category page, the line "There are ## pages in this section of this category." is parsed to determine how many pages are on that page of that category. As categories can span multiple pages, "(next 200)" links are also followed. Finally, the same is done for the subcategories of Category:Music cleanup by month. When all counts have been retrieved, the bot will pull the total number of articles in the English wikipedia from Special:Statistics, and then format a new wikitable for output into the article. It is also possible to configure the bot to output the wikitable to stdout, rather than edit the page, if necessary.

    The bot keeps track of a number of statistics, including total number of pages processed, total time, etc. While functionality does not yet exist to do so, it would not be hard to extend the bot script to maintain these statistics on a subpage of the User page. —The preceding unsigned comment was added by Dvandersluis (talkcontribs) 20:43, 19 July 2006 (UTC)
Ok, looks ok, can you make a trial run and post a diff please -- Tawker 07:24, 21 July 2006 (UTC)[reply]
I'm not exactly sure what you're asking me to do... run the bot once? –Dvandersluis 03:31, 23 July 2006 (UTC)[reply]
Pretty much, the trial run is for a week, and you can run the bot, carefully checking it's edits. During the test keep the edits to no more then 2-3 per min. After the run(s) post the difs here for review by the group/community. — xaosflux Talk 00:55, 26 July 2006 (UTC)[reply]


  • What: orphaning (by linking in most cases) fair use images outside of ns:0
  • How: Based off lists generated by sql queries and reviewed to exclude some potentially ok pages (Portals there is some debate over, Wikipedia:Today's featured article and its subpages, Wikipedia:Recent additions, etc) using replace.py
  • How often: Probably in batches of ~500 pages based on the type of image (such as {{albumcover}}) list of images to remove list of pages to edit. With 10747 images and 19947 pages it may take a while still. Once this group is done, updates will depend on frequency of database dumps and/or whenever the toolserver works again and I can wrangle someone into running a report/get an account.

Kotepho 09:19, 12 July 2006 (UTC)[reply]

This one looks like it's going to be a magnet for complaints with people who don't understand image use policy but it does sound necessary. I'd start with a very well written our FAQ page and leave a talk page message on their page saying what the bot did and why it did it before I would run/approve it -- Tawker 21:36, 15 July 2006 (UTC)[reply]
Durin's page is quite good. Kotepho 21:49, 16 July 2006 (UTC)[reply]
Isn't this similar in at least some ways to what OrphanBot does? I'd like to hear Carnildo's comments on this, given that he runs OrphanBot and is on the approvals group. Essjay (Talk) 14:52, 16 July 2006 (UTC)[reply]
It is basically the same role, the main reason I brought it up is looking at OrphanBot's talk over time it does have a lot of complaints, (I suspect its people not familiar with policy mostly howerver) - it's just something FRAC, I have no problems with the bot personally and I'll give it the green light for a trial run, I just wanted to make sure Kotepho knew what deep water (complaints wise) this is :) -- Tawker 18:52, 16 July 2006 (UTC)[reply]
Well, someone has to do it, and I'd likely do at least some of them by hand so the complaints will come either way. Kotepho 21:49, 16 July 2006 (UTC)[reply]
Ahh, what the heck, run it in trial mode and lets see what happens -- Tawker 07:24, 21 July 2006 (UTC)[reply]


The bot is manually assisted, performing interwiki links and standardization, plus handling double redirects. It runs in the pywikipedia framework. It shall mostly see to that articles from the Romanian Wikipedia get linked to their homologues in the English one. --Rebel2 19:15, 17 July 2006 (UTC)[reply]

  • No problem. Give it a week's trial run and report back this time next week. Essjay (Talk) 00:43, 19 July 2006 (UTC)[reply]

Between February and April this year, I made a large number of typo-fixing edits (approximately 12,000 in total). All of these were done manually – every edit was checked before saving – although I have written software similar to AutoWikiBrowser to assist with the process. This software is designed specifically for spellchecking and so, while not as flexible as AWB, has a number of advantages. It reports the changes made in the edit summary, can check articles very quickly (in less than a second), and can easily switch between different corrections (for example, "ther" could be "there", "the" or "other") in a way that AWB cannot. Central to this is a list of over 5000 common errors that I have compiled from various sources, including our own list of common misspellings, the AutoCorrect function of Microsoft Office, other users' AWB settings, and various additions of my own. As I mentioned, I have done an extensive amount of editing with the aid of this software, using my main account. I have recently made further improvements to the software; over the last couple of days I have made a few edits to test these improvements, and I am now satisfied that everything works.

While I believe Wikipedia is now so heavily used that (a) no one person could hog the servers even if they wanted to, and (b) the Recent Changes page is more or less unusable anyway, a couple of users have expressed concerns about the speed of these edits (which reached 10 per minute during quiet periods). Most notably, Simetrical raised the issue during my RfA. As I stated in my response to his question, I was not making any spellchecking edits at that time, but I explained that I would request bot approval should I decide to make high-speed edits in the future. That time has now come; I have created User:GurchBot, and I request permission to resume exactly what I was doing in April, but under a separate account. I will leave the question of whether a bot flag is necessary to you; I am not concerned one way or the other.

Thanks – Gurch 19:45, 15 July 2006 (UTC)[reply]

As long as you are checking it yourself and ignoring the "sic"s, it seems good to me. Alphachimp talk 23:54, 15 July 2006 (UTC)[reply]
Yes, I check every edit before I save it, and I ignore [sic] when I see it. I have incorrectly fixed a couple of [sic]s in the past because I (the falliable human) failed to spot them; one of my improvements has been to add [sic]-detecting to the software so it can alert me to this, and hopefully make an error less likely in future – Gurch 10:03, 16 July 2006 (UTC)[reply]
  • I don't have any issue with this, provided you aren't doing any of the spelling corrections that tend to cause problems, such as changes from Commonwealth English to American English and visa versa. As long as it's only correcting spelling errors and doesn't touch spelling variations, it should be fine. I'd like to see a week's trial (which is standard) to get a good idea of exactly what will be taking place, and also for users to add thier comments. A week's trial is approved, please report back this time next week. Essjay (Talk) 14:47, 16 July 2006 (UTC)[reply]
I have never corrected spelling variations, regional or otherwise – being from the UK, I have long since given up and accepted all variants as equally permissible anyway. If you wish, I can upload the entire list and replace the (now out-of-date) User:Gurch/Reports/Spelling; I will probably do this at some point anyway. I won't be around in a week's time, so you can expect to hear from me in a month or so. For now, you can take this to be representative of what I will be doing – Gurch 16:11, 16 July 2006 (UTC)[reply]
Just wanted to make sure I'd said it. ;) A month is fine; we normally do a week's trial, but I have no issues with something longer. Let us know how things are going this time next month. Essjay (Talk) 22:22, 16 July 2006 (UTC)[reply]

If these are manually-approved edits, I wouldn't think approval as a bot would be strictly necessary, though I could imagine the speed might be a concern, especially if errors are (or were) slipping through. Given that this is more of a "semi-bot", I suggest it not be bot-flagged, so as to reduce the likelihood of errors going undetected subsequently as well. Alai 04:24, 18 July 2006 (UTC)[reply]

In fact approval as a bot wasn't necessary – as I mentioned above, I used to do this using my main account, and would have continued to do so, except that a number of users expressed their concern and suggested I request approval for a bot. So I have done that. I freely admit that errors will inevitably slip through at some point; in fact, I've just had to apologize for correcting a British spelling which was, fortunately, spotted and reverted very quickly. Of course I never intended to do any such thing – it turns out that this (actually correct) spelling has been listed on Wikipedia:Lists of common misspellings/For machines (one of the sources for my correction list) since November 2002; somehow it was never spotted in nearly four years. My fault, of course, for assuming the list was correct; I'm now scrutinizing my list thoroughly to avoid repeating this mishap. This is the first time I've made such a miscorrection, the reason being that my old list was constructed by hand, whereas I've now tried to expand it (and so catch more errors with each edit) by including lists from other sources. In the past I have occasionally mis-corrected "sic"s and errors in direct quotations; the chance of this should be much lower now that my software can detect these itself, even if I miss them. Based on what I have done to date, though, I reckon my error rate is about 1 in every 1000 edits, which I can live with – Gurch 11:38, 18 July 2006 (UTC)[reply]
  • As I said above, you're cleared for a month-long (instead of a week, at your request) trial; check back with us then and we'll set the bot flag. Essjay (Talk) 00:47, 19 July 2006 (UTC)[reply]

I would like permission to run a bot to tag newly created copyvio articles with {{db-copyvio}} (although I would only tag with {{nothing}} and exit so I can look over the edit until I am confident in its accuracy in identifying copyvios). The bot is written in perl, although it calls replace.pl (from pywikimediabot). Once I work out the bugs, I would want to have the bot running continuously. -- Where 01:44, 12 July 2006 (UTC)[reply]

How do you intend to gather the "newly created copyvio articles"? — xaosflux Talk 03:00, 12 July 2006 (UTC)[reply]
The bot watches on the RC feed at browne.wikimedia.org. Every new article is downloaded, and the text is run through a yahoo search to see if there are any matches outside of Wikipedia. -- Where 04:12, 12 July 2006 (UTC)[reply]
But what if the text is a GFDL or PD source, or quotes a GFDL/PD source?--Konstable 04:52, 12 July 2006 (UTC)[reply]
Also, how about fair use quotes? --WinHunter (talk) 05:59, 12 July 2006 (UTC)[reply]
  • Wouldn't it be better to report potential copyvios (at an IRC channel, and at WP:AIV or a similar page for non-IRC folks) instead of just tagging them outright? Also, you could use Copyscape, similar to how the Spanish Wikipedia implemented this idea. Try talking to User:Orgullomoore for ideas. Titoxd(?!?) 06:35, 12 July 2006 (UTC)[reply]
    • Yes, I suppose since the bot is bound to have a large number of false detections of coyvios it would be best to report it in a way other than simply tagging articles for speedy deletion. I like Titoxd's idea of listing the possible copyvios on a page similar to AIV (later, perhaps, I can implement an IRC notification bot if this goes okay). I looked at copyscape, however, and it only will allow for 50 scans per month unless I pay them money, which I am not willing to do. Thanks for your time! -- Where 14:44, 12 July 2006 (UTC)[reply]
      • Again, ask Orgullomoore. He runs more than just 50 scans a month, so you two might be able to work something out. Titoxd(?!?) 05:31, 13 July 2006 (UTC)[reply]
  • What would be best is if it put a notice on the talk page "This article might be a copyvio" and added that article to a daily list (in the bot's userspace) of suspected copyvios. Then humans could use their judgement to deal with them properly... overall I think it would speed things up tremendously, since we'd have all the likely copyvios in one place. It should probably avoid testing any phrases in quotation marks, but other than that, I don't think it would pick up a huge number of false positives. In my experience with newpage patrol, for every 99 copyvios there's maybe 1 article legitimately copied from a PD/GPL site. Like I said earlier, it's rather amazing that we don't have a bot doing this already, and I'm glad someone's developing it finally. Contact me if you need any non-programming help with testing. --W.marsh 21:30, 12 July 2006 (UTC)[reply]
    • The problem with putting a notice on a talk page would be that it would create a large number of talk pages for deleted articles; that being said, if you still think it is a good idea, I will trust your judgement and implement it anyway once I am confident in the bot's accuracy. Also, just out of curiosity, what do you think is wrong with searching for exact phrases? (when I was not testing for exact phrases, the bot claimed that a page was a copyvio of a webpage that listed virtually every word in the English language). Thanks for your suggestions, and your time. -- Where 23:02, 12 July 2006 (UTC)[reply]
Oh, you're probably right about the talkpages, I hadn't thought of that. For the other thing, I mean that it shouldn't search for phrases that were originally in quotation marks in the test article, since those are probably quotations that might be fair use. But it should definently search for other exact phrases from the article on Google/Yahoo whatever. By the way, I think Google limits you to 3,000 searches/day, Yahoo might too... not sure if that will have an impact. --W.marsh 23:09, 12 July 2006 (UTC)[reply]
I got the impression that yahoo was more lenient than google. But if worse comes to worse, I will have to just use the web interface rather than the API (which should allow me unlimited searches). -- Where 23:31, 12 July 2006 (UTC)[reply]
This seems like a good idea, but the only concern I would have is that the process be supervised by a non-bot (i.e. human, hopefully). Tagging the talk page or on an IRC channel seems like a good idea; admins would simply have to remember to check those often and make sure that the bot is accurate. Thanks! Flcelloguy (A note?) 05:11, 13 July 2006 (UTC)[reply]
I agree; the bot will have a fair amount of errors because of the concerns voiced above. Thus, the bot will edit only one page, which will be outside article space. This page would contain a listing of suspected copyvios found by the bot. During the trial period, I would set the bot to edit a page in my userspace; if the bot is successful, perhaps the page could be moved to the Wikipedia namespace. Does that address your concern? If not, I'm open to suggestions :) -- Where 18:05, 13 July 2006 (UTC)[reply]
I like this idea in general. My only concern is that even with liberal filters it could create a massive, unmanageable backlog. Have you tried to estimate how many pages per day/week would this generate? Misza13 T C 19:10, 13 July 2006 (UTC)[reply]
I have not done so yet; however, based on tests so far, I would estimate that the backlog would be managable. It is hard to tell for sure though, without a trial. Thus, I just started the bot so it commits to a file, and does not touch Wikipedia. When I finish this trial, I will be able to give an estimation of how many suspected copyvios it finds per day. -- Where 19:29, 13 July 2006 (UTC)[reply]
I just did a 36 minute test, in which 4 potential copyvios were identified. If I did the calculatins correctly, this would mean that 160 potential copyvios would be identified on a daily basis (assuming that the rate of copyvios is constant, which is obviously not the case). This is a lot, but should be managable (especially if A8 is amended). Also, I should be able to reduce the number of false identifications with time. Two of the items identified were were not copyvios; one was from a Wikiedia mirror, and I am still examining the cause of the other one. -- Where 21:53, 13 July 2006 (UTC)[reply]
Yes, having the bot edit one page and listed the alerts there would alleviate my concerns. The test is also quite interesting, though I would like to perhaps see a longer test - maybe 24 or 48 hours? 36 minutes may not be reliable data to efficiently estimate the daily output. Thanks! Flcelloguy (A note?) 23:56, 13 July 2006 (UTC)[reply]
Okay; I am starting another test and will have it run overnight. -- Where 00:08, 14 July 2006 (UTC)[reply]

The bot is currently listing possible copyvios to User:Where/cp as it finds them. -- Where 01:56, 15 July 2006 (UTC)[reply]

Suggestion, could you change the listing format (see below)
  • Thats the current format
Good idea! The bot now uses that format. Thanks! -- Where 15:14, 15 July 2006 (UTC)[reply]
New format looks better, but of the 3 items listed on there right now, none are actionable, see comments per item on that page. — xaosflux Talk 01:03, 16 July 2006 (UTC)[reply]
Thanks :). I removed the items. -- Where 01:48, 16 July 2006 (UTC)[reply]

Howdy. The bot has been running for a tad over a week. If anybody has any suggestions for improving the bot, I would be appreciative. Also, I am kind of curious how long the trial period lasts. Many thanks, -- Where 03:33, 26 July 2006 (UTC)[reply]

Proposed disambiguation bot, manually assisted, running m:Solve_disambiguation.py. I will be using this to work on the backlog at Wikipedia:Disambiguation pages with links; bot assisted disambiguation is substantially more efficient than any other method. The bot will run from the time it gets approval into the foreseeable future. --RobthTalkCleanup? 16:20, 13 June 2006 (UTC)[reply]

I see no reason we couldn't have a trial run, at least. robchurch | talk 20:44, 1 July 2006 (UTC)[reply]
Thanks. I'll start running it at low speed in the next couple of days. --RobthTalk 04:04, 2 July 2006 (UTC)[reply]
(In response to a request for a progress report): I've made a small run, which went quite well, but limits on my disposable time have prevented me from making any larger runs just yet--RobthTalk 01:07, 10 July 2006 (UTC)[reply]
  • No problem, trial extended, keep us informed and report back when you have enough done for us to make a final decision. Essjay (TalkConnect) 08:24, 12 July 2006 (UTC)[reply]


Approved

Approved, not flagged

Approved, flagged