User:StatisticianBot

From Wikipedia, the free encyclopedia
Jump to: navigation, search
StatisticianBot
This user is a bot
(talk · contribs)
Pix.gif
Operator Dvandersluis
Author Dvandersluis
Approved? Yes
Flagged? Yes
Task(s) Maintain WP:GAN/R
Edit rate Once per day
Edit period(s) 4am UTC
Automatic or manual? Automatic
Programming language(s) Ruby
Exclusion compliant? No
Source code published? Partially
Emergency shutoff-compliant? Yes

Bot functions[edit]

The purpose of this bot is to automatically update various statistics-related pages on Wikipedia that would be tedious if done by a human. There are currently three tasks, defined in detail below, that this bot will do:

  1. Update statistics on Category:Cleanup by month.
  2. Maintain a statistical report on Wikipedia:Good article candidates
  3. Update Template:Copyedit progress for WikiProject League of Copyeditors.

Bot internals[edit]

  • This bot runs on Ruby 1.8.7, using a self-made framework that utilizes cURL.
  • The bot runs via cron job.
  • Each task runs once per day, during off-peak hours. The tasks do not run concurrently.
  • The bot will terminate any task, and not update any article, if an error is detected.
  • Maintainer is User:Dvandersluis.

Current tasks[edit]

Good article candidates[edit]

This task was requested by User:Mike Christie, and is outlined in detail on User:Mike Christie/GACbot. The purpose of this task is to compile a statistical report on Wikipedia:Good article candidates in order to aid the maintainers of that page to identify certain trends. As well, the bot will update the GAC backlog template with the oldest five nominations. The bot will also generate a special page employing ParserFunctions to allow for its transclusion for access to specific statistics, needed on other templates (rather than editing complex templates itself).

This task puts as little strain as possible on the Wikipedia servers. While a number of sub-tasks are being performed by the bot, only one page is required to be fetched in order to provide the necessary data. As well, the bot will only write to a minimal number of pages (currently three).

Detailed description[edit]

The original specification can be found at User:Mike Christie/GACbot.
  • The bot starts at Wikipedia:Good article candidates and downloads that page's wikitext. Using special comments inserted into the page, the bot isolates the section of the page containing the nominations.
  • The bot will immediately abort if the page is not downloaded correctly, if the nomination section cannot be detected, or if the bot is unable to successfully login to Wikipedia. This would most likely be caused by a timeout on the bot's part, or a change in the format of the GAC page.
  • Using a series of regular expressions, the bot parses the page into an object of nested nomination categories and nominations. All pertinent information to be used later is stored within the object:
    • Nominator and nomination date.
    • Length status, if available.
    • On hold status, if applicable, along with the user who placed the article on hold, and the timestamp of the status change.
    • Under review status, if applicable, along with the user who is reviewing the article, and the timestamp of the status change.
    • Any malformations to the nomination detected during the parse.
  • Once the bot has the necessary data, it formulates a report of the data. The report will be written on Wikipedia:Good article candidates/Report. The report currently consists of four sections:
    1. Old nominations report: a list of the oldest 10 unreviewed nominations, sorted by age.
    2. Backlog count: a daily list of how many total articles are listed for GAC, how many are on hold or under review.
    3. Exception report: a list of unexpected or undesirable issues.
    4. Summary: a list by category, showing some nominations statistics in each category.
  • The bot will update Good article candidates/backlog/items with the oldest five nominations, for use in the backlog template.
  • The bot will finally update Template:GACstats. This page will allow other templates/pages to quickly acquire information from the GAC report without having to be updated specifically by the bot; rather, they would add transclude the page with a certain parameter, per statistic.
  • This task was approved at WP:B/RFA on 2007-05-12.

Disabled tasks[edit]

Cleanup by month[edit]

The purpose of this task was to keep the Number of articles remaining table updated. This task was been performed between July 26 2006 and August 31 2009 originally by CbmBOT (also operated by Dvandersluis), and then taken over by this bot.

The final version of this bot was 3.0.1, updated April 13 2009.

Detailed description[edit]

  • The bot starts at Category:Cleanup by month and collects the categories (listed under the Subcategories section on that page), named "Cleanup from {MONTH} {YEAR}", that contain pages needing cleanup.
  • Each category page is inspected, and the number of pages in that category is calculated:
    • The bot looks for the string "There are ## pages in this section of this category." at the top of the "Pages in category..." section on each category page, and keeps track of that number.
    • The bot will follow "(next 200)" links on category pages in order to get the complete count for the category.
    • Pages in subcategories are not counted twice.
    • Pages of the form Wikipedia:Cleanup/<MONTH> are ignored for counting purposes, as they are not truly in need of cleanup, but rather information pages about what needs cleanup.
  • The bot repeats the previous process, using the subcategories on Category:Music cleanup by month. This step is currently being skipped, as no such categories currently exist. If they are ever recreated, the bot will continue counting them.
  • The bot will immediately abort if a count of 0 is returned for any category (as this is an impossibility and means that the bot had trouble parsing a page, or, more likely, timed out while trying to do so).
  • If the bot successfully retrieved information from each category, it will pull the total number of articles from Special:Statistics.
  • The bot will then format the information gleaned into wikicode, and update the section.
  • The bot keeps track of the elapsed time and number of pages processed. On average, a successful run takes about three minutes, and processes less than one hundred pages.

Proposed future tasks[edit]

League of Copyeditors progress template[edit]

WikiProject League of Copyeditors maintains a template, Template:Copyedit progress, that tracks the project's progress of copyediting tagged articles. At present, it is manually updated, but this is a long process. This task, as done by the bot, would parse the proofreading page, count the completed proofreads, and update the template.

The League of Copyeditors has changed its name, and its needs are changing a bit too. We desperately need a process that does almost exactly what this bot does for GAN. I have written specifications based on the original specs here: User:Noraft/GOCEbot. Maybe it wouldn't be too much tweaking to get this bot going on that project. We're doing a backlog elimination drive May 1, and it would be awesome if it was running by then (don't know if your schedule permits for that, though). Anyway, thanks for the bot at GAN. Works awesome! ɳorɑfʈ Talk! 14:31, 14 April 2010 (UTC)