Wikipedia:WikiProject Women in Red/Metrics/Wikidata

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

WikiProject Women in Red's metrics are largely dependent on Wikidata having a record for the en.wikipedia article; and on the record specifying that the article is for a human with female gender. Gender gap statistics provided by Denelezh's tool and by WHGI are similarly dependent on wikidata; and to get accurate percentiles, we need articles to males to have their gender specified, too.

There is no automatic process by which wikidata records (known as Items) are created to match the creation of wikipedia articles. Wikidata items are added by hand, or by individuals running reports and bots which lead to item creation and population.

This page provides links to a number of Petscan and Listeria reports which can be used to identify and remedy missing and incomplete wikidata records. The reports will list wikipedia article that lack wikidata items, or where the item lacks a gender property or lacks any properties.

To use all the facilities of Petscan, it may be useful to authorise WiDaR to allow Petscan to make edits, under your control, on your behalf.

All of the reports below take a considerable time to run. Note that Petscan (or the toolserver on which it runs) often fails, either hanging (it just does not produce a result) or else returning an error "502 - Bad Gateway" ... the solution is to run the report again, or run the report later.

Petscan works by looking at articles in categories and sub-categories: the deeper you ask Petscan to look, the longer Petscan will take to do its job. The deeper down the category:Women tree you descend, the more male & non-biography articles you find, because the Wikipedia category tree is somewhat idiosyncratic.

Articles with wikidata items having no P31 nor P279 property[edit]

This is the biggest problem area right now. There are perhaps 3000-5000 items for biography articles, which have no P31 or P21 codes, and so do not show up in statistics. A top-tip for navigating these lists is to produce a list of all articles starting with a particular letter; to do this, select one of the manual run options, go to the Output tab, and in the Regexp filter, enter the following: ^A.*$ ... which means, give me articles that start with the capital letter A. Adjust for whichever letter of the alphabet you choose.

Wikidata property P31 is used to code an item as human (P31=Q5). (Property P279 codes an item as a subclass of something). All items should have a P31 or a P279, and must have a P31=Q5 to be counted in our statistics. It's a fair bet that lack of P31 infers a lack of property P21 which is used to code for gender. Most of these reports will list items that are not biographies, but there are biographiess to be found, which require both P31 & P21 adding. And to the extent you can be bothered, coding non-biographies with an appropriate P31 or P279 helps to clear the wood so that we can see the trees.

  • Articles from category:Women - will return 10,000 articles, the vast majority of which will not be biographies.
  • Articles from

Articles with no wikidata item[edit]

The backlog of articles with no items is fairly small - perhaps only 200-300. The majority of articles are for names where existing items having the same name as their label exist, so the challenge is to check whether an existing item matches the article (in which case add a sitelink, and check that there is a P21 gender code), and if not, create a new item.

  • to 0 levels of depth - manually run - auto-run - probably the most productive petscan, albeit mainly men.
  • Articles from Special pages
  • Pages not connected to items - a contemporaneous listing of articles with no wikidata items, most recent articles listed first. Useful for spotting women biogs as they arrive, such that a wikidata item can be added.

Articles with wikidata items coded as human, but with no gender code[edit]

This area is largely under control, although from day to day, new items will be created with P31=Q5 (i.e.the article is about a human) but with no P21 gender code. Putting the Listeria reports on your watchlist and dealing with new additions as they happen is probably the easiest way to approach this problem area, but running petscan reports is helpful since, for a variety of reasons, not all genderless biography items will appear on the Listeria reports.

  • Listeria reports for biographical items with no gender
These lists should be quite short, filling only as quickly as new items with no gender are added to wikidata; the backlog was dealt with in February 2019. Some lists have a small number of people for whom it has not been possible yet to find a gender.
Note that Listeria does not completely empty the lists, and so you may expect to find lists with one or a few names, all of which turn out to have genders. You can, optionaly, hand-edit the table to empty it.

Females given male gender coding[edit]

Again, this area is probably mostly under control, although new instances of wrongly coded items can appear at any time.

The following reports list biographies which are found beneath Women categories in Wikipedia, but which have gender = male in Wikidata. Most listed items are false positives - go deep enough into any women category in wikipedia and you find men. Note that the reports are ordered with earliest items listed first ... incorrectly coded items will tend to be found at the bottom of lists.

Four likely outcomes are suggested after inspection of an item / article: 1) change the gender coding on wikidata 2) remove inappropriate women categories from the article 3) remove inappropriate women categories from categories found on the article 4) shake your head in wonder at the vagueries of the wikipedia category tree.

  • Women by Occupation (excluding Women in sports)