User:SkylinBot

From Wikipedia, the free encyclopedia

(NOTE, this bot has nothing to do with the user 'Skylin'. The name 'Skylin' comes from Scalable Kylin. Kylin is a research project at the University of Washington.

The Skylin bot is a bot built to help keep Infobox data up to date.

This bot builds on research aimed at which is detailed at Autonomously Semantifying Wikipedia which is available at http://turing.cs.washington.edu/papers/cikm07.pdf

The general gist of the paper is that a system is trained via various machine learning techniques to be able to extract the data that goes into an infobox for a given class.

For the rest of this discussion, we will consider US County as our reference class, though the bot will ultimately operate across a wider set of categories.

This bot will look for, and correct a specific set of situations:

The first case we will manage is the case where a given property is missing in an infobox for a standard field, but the info is extractable from the text of the article, that property will be added to the infobox. (e.g. If there is a sentence that says: "The county seat of King County, WA is Seattle" in the text of the article, but the County_Seat property is not set, the bot will automatically set it. Note, this is not a specifically coded rule, but rather an example of what one of the trained extractors may learn.

We will consider additional scenarios after this have been completed. In all cases, we will only use extractors that have demonstrated extremly high precision.

In early runs of the bot, there will be human verification of each edit in advance of the commitment. The initial permission which is being requested will only cover the 'assisted' runs of the bot. If and when we are confident enough in it's results, we will pursue a second request for permission to run the bot in its unassisted form.

Also, unless we seek additional permissions, the bot will be kicked off manually, rather than automatically.

Frequency of runs will be determined in the long run by the effectiveness we can achieve. For at least the next month runs will be manually managed, and done primarily as validations of the bot.

The maintainer of the bot is CUDub.

This bot is written in .Net, leveraging the DotNetWikiBot project. It also ultimately leverages Java and the Mallet toolkit for machine learning.

UPDATE: This bot has been denied due to a lack of documentable editing history by the owner. This account will still be used to do assisted editing, but it will always be human verified ahead of committing an edit. (The bot will suggest edits to a human editor, who will accept or reject). This is acceptable within the terms Wikipedia bot policy.