User:WP 1.0 bot/Web/FAQ

This page has answers to some questions that might be frequently asked about the new Release Version Tools website. For questions about Release Versions in general, please see Wikipedia:Version 1.0 Editorial Team/FAQs.

Background[edit]

What was wrong with the old system?[edit]

Issues encountered with the old system includes:

The old code stored all its data in long lists in wiki pages, using the wiki as a makeshift database. Updating this data requires an enormous number of page edits to complete a single run of the script. A full update of the old system required over four days to complete.
The code was not configurable on a per-project basis. Requests to add a special rating for a WikiProject are common, but the bot code was not written with that in mind.
Although there is a lot assessment data, it wasn't possible to use this data to make dynamic queries. For example:
1. There was no easy way to generate a list of articles rated by both the Military History and Australia projects, although the data needed for this is already collected.
2. There was no easy way to get a log of all assessment changes for a particular article. When a log page gets too long, the old information must be removed, leaving it only available in the log page's history. More rarely, if there was too much data to fit onto the log page, the old system lost some log data entirely by truncating the log page before saving it.

How often does the new system update its data?[edit]

The new system updates every project daily. You can also run a forced update at any time, see User:SelectionBot/Web/Guide#Update project data.

Where is the new system hosted?[edit]

It was on the toolserver for many years, but is now on a server maintained by User:Kelson.

Is the code available?[edit]

The code is available at https://github.com/openzim/wikimedia_wp1_bot . It is released under the GNU GPL license, version 2.

Using the web tools[edit]

How to set things up on the wiki[edit]

What changes does my project need to make to use the new system?[edit]

The new web system uses the same Wikipedia categories as the old system, so no changes are needed.

The new system does allow more configuration by each Wikiproject. This is done with the {{ReleaseVersionParameters}} template. Please consult that template's documentation for details.

Assessment statistics[edit]

How is the global table generated?[edit]

The highest quality and importance score for each article is used. Non-article classes, such as Category-class, are ignored.

Projects that use non-standard ratings (such as B+-quality or Bottom-importance) have the ability to configure which of the standard ratings their custom ratings are equivalent to. The equivalent standard ratings are then used in generating the global table.

Why are the counts for FA and GA in the global table inaccurate?[edit]

There are many different ways to count the number of Featured articles:

The number given at Wikipedia:Featured articles (currently 6,493) is updated based on the list on that page^{But only 2195 entries are listed in that page?}.
The number of articles in Category:Featured articles
The number of talk pages in Category:Wikipedia featured articles
The number of pages transcluding {{featured article}}
The number of talk pages that have a featured article banner (for example, from {{ArticleHistory}})
The number of pages for which at least one project has marked the quality FA-Class

These different methods will often give conflicting results. Although there are bots that try to keep the first five numbers equal, they often differ slightly.

The WP 1.0 bot uses method 6 (article assessment data) to generate its counts. This is dependent on wikiprojects assessing articles accurately, and so the count generated by the bot will often differ from the count generated by the FA project due to minor errors.

Additionally, some projects do not use FL-class, rating all their featured lists as FA-class. This causes the counts in the table to be different than expected.

The count of Good Articles has another issue: a Good Article may be assessed as A-class, which is a higher assessment than GA. The global summary table uses only the highest rating for each article.

A final reason for the counts being different is that some projects may have custom ratings that are configured to show up as GA, (or FA, or FL) in the global statistics table.

Release versions[edit]

How are the hitcount scores generated?[edit]

The daily hitcount data for several months is used to compute a trimmed average daily hitcount. The top 20 percent and bottom 20 percent of the data points are ignored, and the remainder are averaged. The goal of this process is to get a number that should not be sensitive to which articles have been on the main page or in the news. Daily hitcounts for articles on the main page are much higher than the usual hitcounts for those articles, but spot-checking suggests that these return to the usual hitcounts within several days.

What is the automated selection?[edit]

There are too many assessed articles for us to manually compile a list of articles to include in release versions. Therefore, a semi-automated system is used to make a list of articles that should be included. This system generates a "selection score" for each article (see below). To make a release version, we start by setting a cutoff so that the number of articles with scores above that cutoff is approximately the number of articles we wish to include in the release.

How are selection scores generated?[edit]

See Wikipedia:Version 1.0 Editorial Team/SelectionBot

What is the manual selection form for?[edit]

In rare occasions, some articles should be included in a release even though their scores are not above the cutoff threshold; see Wikipedia:Version 1.0 Editorial Team/Release Version Criteria for the list of criteria. The manual selection tool will be used to keep track of these articles for future releases.

Can projects ask that an article be left out of the release?[edit]

Yes. This was possible in 0.7 but was handled manually. The manual selection system will be extended to help us keep a list of articles that projects suggest should not be included in release versions.

On-wiki tables and logs[edit]

The global summary table[edit]

There is a global summary table shown at Wikipedia:Version 1.0 Editorial Team/Statistics. Since 2010, this table has been automatically recomputed each day by taking the highest quality and importance rating for each assessed article in the main namespace.

Issues[edit]

This table should be read with a grain of salt, because ratings are WikiProject-specific.

There are not ratings of the general importance of articles, only ratings of their importance to specific WikiProjects. Because the highest importance rating is used, if one project rates an article "High-Class" and another rates it "Low-Class", the article will count as High-Class in the global table. This is true even if the WikiProject assigning the higher rating is a "smaller" or "narrower" WikiProject than the one assigning the lower rating. Some projects choose not to assign importance ratings at all.

Similarly, if one project assesses an article as A-Class using a WikiProject-specific rating system, and the article is not assessed as A-Class (or higher) by any other project, then the article will count as A-Class for the global table, even if some or all other projects have rated the article B-Class.