Wikipedia:WikiProject Women in Red/Metrics/Wikidata

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

WikiProject Women in Red's article creation metrics, and article gender-gap statistics provided by Denelezh's tool and by WHGI, are all dependent on Wikidata having a record (termed an 'item') which has a sitelink to the en.wikipedia article; and on the item specifying that its subject is a human with female gender. Accurate gender-gap percentiles depend on such items for all male, as well as all female biography articles.

There is no automatic process by which wikidata items are created to match the creation of wikipedia articles. Wikidata items are added by hand, or by individuals running reports and bots which lead to item creation and population.

This page provides links to a number of Petscan and Listeria reports that can be used to identify and remedy missing and incomplete wikidata records. The reports list wikipedia article that lack wikidata items; or wikidata items that lack a gender property or any properties.

To use all the facilities of Petscan, it may be useful to authorise WiDaR to allow Petscan to make edits, under your control, on your behalf. It's also very highly recommended that you add the Wikidata framework to your Preferences / Appearance / Shared CSS/JavaScript for all skins: / Custom JavaScript page ... the Framework provides new left-side menu options enabling the creation and editing of wikidata pages from wikipedia articles.

All of the reports below take a considerable time to run. Note that Petscan (or the toolserver on which it runs) often fails, either hanging (it just does not produce a result) or else returning an error "502 - Bad Gateway" ... the solution is to run the report again - sometimes repeatedly - or run the report later.

Petscan works by looking at articles in categories and sub-categories: the deeper you ask Petscan to look, the longer Petscan will take to do its job. The deeper down the category:Women tree you descend, the more male & non-biography articles you find, because the Wikipedia category tree is somewhat idiosyncratic.

High priority[edit]

Probably the highest priority of all of the reports below are the following, which track new articles without wikidata items.

  • Articles with no wikidata item:

Articles with wikidata items having no P31 nor P279 property[edit]

This is the next biggest problem area right now. There are perhaps 2000+ items for biography articles, which have no P31 or P21 codes, and so do not show up in statistics. A top-tip for navigating these lists is to produce a list of all articles starting with a particular letter; to do this, select one of the manual run options, go to the Output tab, and in the Regexp filter, enter the following: ^A.*$ ... which means, give me articles that start with the capital letter A. Adjust for whichever letter of the alphabet you choose.

Wikidata property P31 is used to code an item as human (P31=Q5). (Property P279 codes an item as a subclass of something). All items should have a P31 or a P279, and must have a P31=Q5 to be counted in our statistics. It's a fair bet that lack of P31 infers a lack of property P21 which is used to code for gender. Most of these reports will list items that are not biographies, but there are biographies to be found, which require both P31 & P21 adding. And to the extent you can be bothered, coding non-biographies with an appropriate P31 or P279 helps to clear the wood so that we can see the trees.

  • Articles from category:Women - will return 10,000 articles, the vast majority of which will not be biographies.
  • Wikidata MWAPI search for items with no P31 nor P21
  • search - amend the query to search for specific given or family names

Articles with no wikidata item[edit]

The backlog of articles with no items is fairly small - perhaps only 400-600, although new itemless articles are added on a daily basis. The majority of articles are for names where existing items having the same name as their label exist, so the challenge is to check whether an existing item matches the article (in which case add a sitelink, and check that there is a P21 gender code), and if not, create a new item.

  • Articles from Special pages
  • Articles not connected to items - a contemporaneous listing of articles with no wikidata items, most recent articles listed first. Useful for spotting women biogs as they arrive, such that a wikidata item can be added.
  • Articles from the Duplicity tool
  • Articles not connected to items, of all sorts, greater than 2 weeks old, listed in order of age (oldest first) and (fwiw) stats on number of en.wiki articles with no wikidata item; provided by Magnus Manske's Duplicity tool, updated once daily.

Articles with wikidata items coded as human, but with no gender code[edit]

This area is largely under control, although from day to day, new items will be created with P31=Q5 (i.e.the article is about a human) but with no P21 gender code. Putting the Listeria reports on your watchlist and dealing with new additions as they happen is probably the easiest way to approach this problem area, but running petscan reports is helpful since, for a variety of reasons, not all genderless biography items will appear on the Listeria reports.

  • Listeria reports for biographical items with no gender
These lists should be quite short, filling only as quickly as new items with no gender are added to wikidata; the backlog was dealt with in February 2019. Some lists have a small number of people for whom it has not been possible yet to find a gender.
Note that Listeria does not completely empty the lists, and so you may expect to find lists with one or a few names, all of which turn out to have genders. You can, optionaly, hand-edit the table to empty it.
  • Articles from category:Living people - note, these reports look for no P21, but do not check either way for a P31.

Females given male gender coding[edit]

Again, this area is probably mostly under control, although new instances of wrongly coded items can appear at any time.

The following reports list biographies which are found beneath Women categories in Wikipedia, but which have gender = male in Wikidata. Most listed items are false positives - go deep enough into any women category in wikipedia and you find men. Note that the reports are ordered with earliest items listed first ... incorrectly coded items will tend to be found at the bottom of lists.

Four likely outcomes are suggested after inspection of an item / article: 1) change the gender coding on wikidata 2) remove inappropriate women categories from the article 3) remove inappropriate women categories from categories found on the article 4) shake your head in wonder at the vagueries of the wikipedia category tree.

  • Women by Occupation (excluding Women in sports)