Wikipedia:STiki/Sandbox

From Wikipedia, the free encyclopedia
Jump to: navigation, search
STiki
STiki logo.png
Developer(s) Andrew G. West (west.andrew.g)
Initial release June 2010 (2010-06)
Stable release 2.1 / May 22, 2012; 3 years ago (2012-05-22)
Written in Java
Platform Cross-platform
Available in English
Type Vandalism detection on Wikipedia
License GNU General Public License
Website www.cis.upenn.edu/~westand


STiki is a tool used to detect and revert vandalism on Wikipedia available to trusted users. STiki uses state-of-the-art detection methods to determine which edits should be shown to end users. If a displayed edit is vandalism, STiki then streamlines the reversion and warning process. Critically, STiki is a collaborative approach to reverting vandalism, not a user-centric one; the list of edits to be inspected is consumed in a crowd-sourced fashion. STiki is not a Wikipedia bot: it is an intelligent routing tool that directs human users to potential vandalism for definitive classification.

To date, STiki has been used to revert 745,075 instances of vandalism, spam, and other unconstructive editing on Wikipedia (see the leaderboard and editor milestones).

Multiple approaches (scoring systems), some authored by STiki's developers and others by third parties, are used to determine which edits to display. An end user can choose which scoring system they pull edits from. Currently implemented scoring systems include:

STiki (metadata) Cluebot-NG WikiTrust Anti-Spam
The "original" queue used by STiki, using metadata features and a machine-learning algorithm to arrive at vandalism predictions. More detail about this technique is available in the "Metadata Scoring System" section below. Using an artificial neural network (ANN) to score edits is the ClueBot NG approach. The worst-scoring edits are undone automatically. However, there are many edits that CBNG is quite confident are vandalism, but cannot revert due to a low false-positive tolerance, the one-revert-rule, or other constraints. These scores are consumed from an IRC feed. Built upon editor reputations calculated from content-persistence is the WikiTrust system of Adler et al. More details are available at their website. WikiTrust scores are consumed via their API. Now active! Parses new external links from revisions and measures their external link spam potential. See the "Link Spam Scoring" section below for additional details.

Download[edit]

Front-end GUI, distributed as an executable *.JAR. After unzipping, double-click the *.JAR file to launch (Windows, OS X), or issue the terminal command "java -jar STiki_exec_[date].jar" (Unix).
STiki remains in active development, both the front-end GUI and back-end scoring systems. Check back frequently for updated versions. Note that due to a significant code change, versions dated 2010-11-28 and older are non-functional; an upgrade is required
Full source for the GUI and back-end server. Library dependencies (IRC and JDBC) are not included.
Note that this also contains the source for the WikiAudit tool

Using STiki[edit]

A Wikipedia account is required to use STiki. Additionally, that account must meet some qualifications to ensure STiki's powerful capabilities are not abused. The account must have either: (1) the rollback permission/right, (2) at least 1000 article edits (in the article namespace, not talk/user pages), or (3) special permission via the talk page. We emphasize that users must take responsibility for their actions with STiki.

After login, users primarily interact with the GUI tool by classifying edits into one of four categories:

1.  Vandalism If an edit is blatantly unconstructive and intentional in its malice, then it constitutes vandalism. Pressing the "vandalism" button will revert the edit, and the "warn offending editor" box should be checked so the guilty party is notified of their transgression. Multiple warnings will result in reporting at AIV and subsequent blocking. However, you may wish to avoid templating the regulars, as some construe this as poor wiki-etiquette.
2.  Good-faith Revert Sometimes edits are clearly unconstructive, but lack the intent and malice that characterizes vandalism. In these cases, one should assume good-faith by undoing the changes using a "good-faith revert." In this case, the change is undone but the offending editor is not issued a warning.
3.  Pass If a STiki user is uncomfortable in whether an edit is guilty or innocent, he/she can skip or "pass" the edit. The revision will remain live on Wikipedia and the edit will be shown to another STiki user. Use pass only when you believe there is some chance the edit is vandalism, but you lack the subject expertise to firmly make that decision.
4.  Innocent If an edit is primarily constructive and not vandalism, it should be labeled as "innocent." This does not mean the edit must be perfect in nature. Indeed, STiki is anti-vandal focused and cannot remedy many issues, which should be handled outside the tool (using the provided hyperlinks).
The STiki user interface showing an incidence of vandalism. The buttons for classifying edits are on the left. The links for deeper investigation are near the bottom in "last revert" and "edit properties".

Uncertainty over constructiveness: If a user is uncertain about whether an edit is constructive, the quickest solution is often to perform a web search (e.g., with Google); this may reveal whether some "fact" is true. Of course, STiki users should consider the reliability of the source in question. If no reliable source can be found, the correct response may be to add a {{Citation needed}} or {{Verify credibility}} tag, using the normal wiki interface. Where content has been removed, common sense is usually the best guide. Does the removed text have citations? (Note that checking the citations themselves may be necessary in content regarding living people.) What is the edit summary? Does that explanation make sense? Is it discussed on the talk page? Regardless of the issue, anything that requires domain-specific expertise to resolve is probably best classified as "innocent" or "pass".

Uncertainty over malice: It can be tricky to differentiate between vandalism and good-faith edits that are nonetheless unconstructive. Test edits should be classified as "vandalism", as initial warnings and edit comments accommodate this case. Explicit comments that indicate Wikipedia inexperience are probably best labelled "good-faith". Beyond that, common sense is usually the best guide. Consider the article in question. Is it something that young editors might be interested in? Is there any truth in what is being said (absent formatting, language, and organizational issues)? Also see The duck test.

Deeper investigation: Sometimes a revert ("vandalism" or "good-faith") will not repair all the issues presented in a diff -or- the diff doesn't contain enough evidence to make a definitive classification. In these cases, use the hyperlinks (blue underlined text) to open relevant pages in the default web browser. This is helpful, for example, to: (1) View the article talk page to see if some issue was discussed, (2) Make changes using the normal interface, and (3) Use other tools like Popups, Twinkle, and wikEdDiff. When you return to the STiki tool you will still need to classify the edit (note that if you used the browser interface to make changes, pressing "vandalism" or "good-faith revert" will *not* revert your changes). See also: Wikipedia:Recent changes patrol.

Interface tips: STiki has hotkeys to ease user interaction with the tool. After a single edit has been classified with the mouse (giving the button panel "focus"), the keys "V", "G", "P", and "I" will mark edits as "vandalism", "good-faith", "pass", and "innocent" respectively. While in the same mode, the Page Up, Page Down, Up Arrow (↑), and Down Arrow (↓) keys will also scroll the diff browser. Also note that hyperlinks which appear in diffs can be opened in your web-browser, assuming that the "Activate Ext-Links" option (under the "Options" tab) is turned on.

Architecture[edit]

STiki uses a server/client architecture:

(1) Backend-processing: that watches all recent changes to Wikipedia and calculates/fetches the probability that each is vandalism. This engine calculates scores for the Metadata Scoring System, and uses APIs/feeds to retrieve the scores calculated by third-party systems. Edits populate a series of inter-linked priority queues, where the vandalism scores are the priority for insertion. Queue maintenance ensures that only the most-recent edit to an article is eligible to be viewed. Backend work is done on STiki's servers (hosted at the University of Pennsylvania), relying heavily on a MySQL database.

(2) Frontend-GUI: The user-facing GUI is a Java desktop application. It displays diffs that likely contain vandalism (per the backend) to human-users and asks for definitive classification. STiki streamlines the process of reverting poor edits and issuing warnings/AIV-notices to guilty editors. The interface is designed to enable quick review. Moreover, the classification process establishes a feedback loop to improve detection algorithms.

Metadata scoring and origins[edit]

STiki work-flow diagram

Here we highlight a particular scoring system, based on machine-learning over metadata properties. This system was developed by the same authors as the STiki frontend GUI, was the only system shipped with the first versions, and shares a code-base/distribution with the STiki GUI. This system also gave the entire software package its name (derived from Spatio Temporal processing on Wikipedia), though this acronymic meaning is now downplayed.

The "metadata system" examines only 4 fields of an edit when scoring: (1) timestamp, (2) editor, (3) article, and (4) revision comment. These fields are used to calculate features pertaining to the editors registration status, edit time-of-day, edit day-of-week, geographical origin, page history, category memberships, revision comment length, etc. These signals are given to an ADTree classifier to arrive at vandalism probabilities. The ML models are trained over classifications provided on the STiki frontend. A more rigorous discussion of the technique can be found in a EUROSEC 2010 publication.

An API has been developed to give other researchers/developers access to the raw metadata features and the resulting vandalism probabilities. A README describes API details.

The paper was an academic attempt to show that language properties were not necessary to detect Wikipedia vandalism. It succeeded in this regard, but since then the system has been relaxed for general-purpose use. For example, the engine now includes some simple language features. Moreover, there was the decision to integrate other scoring systems in the GUI frontend.

Link spam scoring[edit]

As the core STiki engine processes revisions for vandalism, it also parses diffs for the addition of new external links. When one is found, it is passed to the link processor to have its spam potential analyzed. For each link, a feature vector of ~50 elements is constructed and given to a machine-learning classifier. Those features fall into one of three categories:

  1. Wikipedia metadata: In addition to re-using features from the anti-vandalism classifier, these also includes link-specific signals such as the length of the URL, whether it is a citation, the top-level domain (TLD) of the URL, and metrics capturing the URL-domains addition history.
  2. Landing site analysis: For each site linked, the processor visits the page and obtains the source code (usually, HTML). In turn, this is analyzed to measure a site's commercial intention, offensive content, and the use of search engine optimization (SEO) tactics.
  3. Third-party data: First, Alexa data is used to learn about historical traffic patterns at the URL. This also provides interesting features pertaining to the age of the website, whether it contains adult content, and its quantity of backlinks. Second, the Google Safe Browsing project is queried. This allows URLs that distribute malware or engage in phishing to be identified.

A more formal description of the technique can be found in a WikiSym'11 paper, motivated in part by vulnerabilities and observations from a CEAS'11 paper. A Wikimania 2011 presentation discussed the live implementation of that technique (i.e., the software described in this section). Just as with anti-vandalism, the feedback from GUI use will help in refining the future accuracy of this technique. Orthogonal to the spam-detection task, the processor also reports dead links it encounters to WP:STiki/Dead links, where they can be patrolled by humans to help address the issue of link rot on Wikipedia.

Comparison to other tools[edit]

The following features make STiki distinctive:

1.  Sophisticated algorithms STiki uses multiple algorithms to identify potential vandalism. All are rooted in machine-learning. The most effective approach ("queue") is that of CBNG, which is able to achieve a 50%–60% hit rate (percentage of vandalism for all edits displayed in the GUI). Random search will result in hit-rates < 5%.
2.  The server coordinates STiki users are shown edits from a centrally-maintained queue. When a user is shown an edit, they have a "reservation" so that no other STiki users are viewing the edit simultaneously. Moreover, if a user marks an edit as "innocent", no one will be forced to review this edit in the future. In both cases, redundant work (edit conflicts, multiple reviews of good edits) is being avoided.
3.  The server remembers The STiki server is always watching changes and computing vandalism probabilities, even if no one is currently using the GUI tool. When edits are popped to end-users, this is done purely based on vandalism probabilities not how recent the changes are. STiki has reverted several instances of vandalism that are months old.
4.  Simple interface STiki's interface is a minimal one. This is due to a belief STiki should focus exclusively on vandalism/spam removal, rather than becoming a general-purpose framework for a diversity of unconstructive edits. When more information is needed the interface provides links to relevant pages, which open in a normal web browser.
5.  Cross-platform Developed in Java, STiki is cross-platform.

Related works and cooperation[edit]

STiki's authors are committed to working towards collaborative solutions to vandalism. To this end, an API is available to STiki's internally calculated scores. A live feed of scores is also published to channel "#arm-stiki-scores" on IRC server "armstrong.cis.upenn.edu". Moreover, all STiki code is open-sourced.

In the course of our research, we have collected large amounts of data, both passively regarding Wikipedia, and through users' active use of the STiki tool. We are interested in sharing this data with other researchers. Finally, STiki distributions contain a program called the Offline Review Tool (ORT), which allows a user-provided set of edits to be quickly reviewed and annotated. We believe this tool will prove helpful to corpus-building researchers.

Credits and more information[edit]

STiki was written by Andrew G. West (west.andrew.g), a doctoral student in computer science at the University of Pennsylvania. The academic paper which shaped the STiki methodology was co-authored by Sampath Kannan and Insup Lee. The work was supported in part by ONR-MURI-N00014-07-1-0907.

In addition to the already discussed academic paper, there have been several STiki-specific write-ups/publications that may prove useful to anti-vandalism developers. The STiki software was presented in a WikiSym 2010 demonstration, and a WikiSym 2010 poster visualizes this content and provides some STiki-revert statistics. STiki was also presented at Wikimania 2010, with the following presentation slides. An additional writing (not peer reviewed), examines STiki and anti-vandalism techniques as they relate to the larger issue of trust in collaborative applications.

Beyond STiki in isolation, a CICLing 2011 paper examined STiki's metadata scoring technique relative (and in combination with) NLP and content-persistence features (the top 2 finishers from the 2010 PAN Competition) – and set new performance baselines in the process. A 2011 edition of the PAN-CLEF competition was also held and required multiple natural-languages to be processed; our entry won at all tasks. A Wikimania 2011 Presentation surveyed the rapid anti-vandalism progress (both academic and on-wiki) of the 2010–2011 time period. Finally, a research bulletin published by EDUCAUSE looks at the issue of Wikipedia/wiki damage from an organizational and higher-education perspective with particular emphasis on the protection of institutional welfare.

Queries not addressed by these writings should be addressed to STiki's authors.

Userboxes, awards, and miscellania[edit]