Wikipedia:WikiAudit

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiAudit
WikiAudit screenshot.png
Example WikiAudit report
Developer(s) Andrew G. West (west.andrew.g)
Initial release January 2012
Stable release
0.1a / July 2, 2015; 23 months ago (2015-07-02)
Written in Java
Platform Cross-platform
Available in English
Type Wikipedia/wiki analysis
License GNU General Public License
Website http://www.andrew-g-west.com

WikiAudit is a utility that, given a set of IP addresses as input, outputs a report (see the screenshot image) summarizing the contributions and behavior of those IPs on some wiki (i.e., English Wikipedia). In particular, heuristics direct attention to malicious/unconstructive behaviors.

We envision WikiAudit being useful for:

  1. Institutional/organizational network administrators who want to monitor the contributions coming from their IP space. From the organization's perspective, this can help protect reputation and misuse of organizational resources. Similarly, organizations who take steps to prevent future mis-behavior help benefit the wiki.
  2. Casual readers who use the tool to conduct security investigations and reveal organizational bias in authoring. For example, an edit to Wikipedia article [x] from the IP space of organization [x] might be inappropriately promotional or scrub factual criticism.

Download[edit]

Executable *.JAR. After unzipping, the README file describes command-line/terminal operation. WikiAudit does not include a GUI.
  • Source code for WikiAudit is included with the STiki source-code distribution (visit that page)
The projects share a code-base; core WikiAudit code is in the [audit_tool/] subdirectory

Operation and intended usage[edit]

To control the content of the output report, WikiAudit exposes several parameters:

  • IP addresses - The most basic input is a list of IP addresses for analysis (of course, we can only report on unregistered users; who are responsible for most unconstructive behavior). IP addresses can be provided in simple format (127.0.0.0), expressed as a range (127.0.0.0-127.255.255.255), or in CIDR notation (127.0.0.0/8)
  • Connection string - Users provide a path to the API of the Mediawiki installation they wish to analyze. This capability allows one to analyze any Mediawiki wiki (Wikinews, Wikiversity, etc.). English Wikipedia is used by default, and some template-driven analysis is en.wiki specific. Compatibility with foreign-language installations is untested.
  • Time boundaries - So only events occurring after a particular date will be reported. Useful for periodic updates of IP activity.


These parameters produce a report (a simple HTML document) with the following features:

  • Aggregate statistics: quantity of IPs that edited, number of contributions, revert quantity, blocked users, etc.
  • A high-level look at user/IP participation: (1) Whether the IP has a talk page or not. When an IP address has one, it is most often indicative they have received warning templates for their transgressions. The presence of some template types (spam and vandalism) is made explicit in the audit report. (2) WikiAudit also reports on the existence of shared IP templates on the user-talk page (like Template:Shared_IP or Template:Shared_IP_edu). These give network administrators an opportunity to establish "abuse contacts" with the Wikipedia administration. (3) If a block history exists for the IP.
  • The contributions from the IP space are also exhaustively enumerated: Basic metadata is provided, along with helpful links. More uniquely, a simple heuristic is used to determine whether the subsequent edit reverted or rolled-back the contribution (indicating its poor nature).
  • Where malicious activity is suspected, the edit/user is colored red to draw attention.


Efficiency & responsible use: WikiAudit operates by making batch calls to the wiki's API. The speed of operation depends on the network connection and the density of IP editing activity in the input range. For perspective, on a residential network it is generally possible to produce a report for ~65,000 IPs (i.e., a /16 CIDR) in about 5-minutes time. Producing reports for a massive quantity of IPs at once (say, a /8 CIDR) is not recommended and may lead to API throttling or temporary loss of API access.

Motivation: WikiAudit's creation was inspired by the contributions tool of Mediawiki. In particular, that software function does allow for the input of an IP range or CIDR prefix (this capability is not installed by default, it must be enabled as an extension as described in point #2 HERE). WikiAudit extends this functionality by: (1) enabling programmatic operation, (2) allowing for the input of multiple IP ranges, and (3) providing heuristics for unproductive users/contributions, so administrators do not have to engage in brute-force investigations.

Credits and more information[edit]

WikiAudit was written by Andrew G. West (west.andrew.g), a doctoral student in computer science at the University of Pennsylvania under the advisement of Insup Lee. The work was supported in part by ONR-MURI-N00014-07-1-0907. Queries not addressed by documentation should be addressed to WikiAudit's authors.

Screenshots[edit]