Wiki labels is both the name for a software suite and a WikiProject. In this WikiProject, we produce datasets of labeled wiki artifacts and the software suite is designed to make that work easier. The name can be interpreted either as a noun
- We work together on Wikipedia to produce wiki labels for important data.
or as a verb (similar to "Wiki loves...")
- In order to get the data we need, wiki labels edit quality.
Goals & Scope
Our goal in this project is to produce labeled datasets for pressing needs of the Wikipedia community. Labeled datasets have a variety of uses including research (e.g. qualitative analyses of newcomer quality and editor interactions) and the development of advance wiki tools (e.g. the models used by User:ClueBot NG and WP:STiki). Generally, gathering these types of datasets is difficult as it requires substantial investment of time and effort by a small group of people to "hand-code" a suitably large dataset.
We are concerned with (1) identifying opportunities to produce important labeled datasets, (2) distributing the work as broadly as possible and (3) making it easy and efficient to "hand-code" large datasets. See our list of campaigns for what we're up to recently. If you would like to help out, sign the member list. If you have an idea for a labeled dataset you'd like to produce, inquire on the talk page.
How can I help?
There are a few ways that you can contribute to this project.
- This project is all about adding labels to artifacts in Wikipedia. For most labeling campaigns, a very large number of observations will need to be labeled in order to get any use out of a dataset. So, one of the goals of this project is to most effectively distribute this type of work. If you're interested in contributing, add your name to the list of participants.
- Fixing bugs, implementing new features and improving system performance. Pull requests are welcome! See the repository.
- Loading campaigns, dealing with system issues and helping newcomers get started with labeling work. If you're interested in helping out with Wiki labels janitorial work, contact EpochFail or He7d3r.
Revision scoring as a service
Many of Wikipedia's most powerful tools rely on machine classification of edit quality. In this project, we'll construct a public queryable API of machine classified scores for revisions. It's our belief that by providing such a service, we would make it much easier to build new powerful wiki tools and extend current tools to new wikis. In order to build powerful machine classifiers, we must start with high quality labeled data. That's where Wiki labels comes in. See WP:Labels/Edit quality.
The primary way that wiki tool developers will take advantage of this project is via a restful web service and scoring system we call ORES (Objective revision evaluation service). ORES provides a web service that will generate scores for revisions on request. For example, http://ores.wmflabs.org/scores/enwiki?revids=34854258&models=reverted asks for the score of the "reverted" model for revision #34854258 in English Wikipedia.
- Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 0002764212469365. summary full paper
- m:Grants:IEG/Editor Interaction Data Extraction and Visualization