Jump to content

Wikipedia:WikiProject Women in Red/Wikidata redlist guide

From Wikipedia, the free encyclopedia

This Wikidata redlist guide provides step-by-step guidance to create Women in Red redlists. Although this guide is focused on Women in Red, it may be useful to create Wikidata-based lists for other purposes.

Preliminaries

[edit]

In order to create a Wikidata-based redlist, you will need:

You will use the following tools:

Basics

[edit]

Simple example

[edit]

Let's start with a trivial Wikidata list. It will have a single entry for Ada Lovelace and we'll use the following query:

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q5 .
  ?item wdt:P21 wd:Q6581072 .
  ?item wdt:P735 wd:Q346047 .
  ?item wdt:P734 wd:Q1260681 .
}

Click here to launch the Wikidata query

The above query will get every Wikidata item fulfills these conditions:

  1. Is a human: instance of (P31) human (Q5).
  2. Is a female: sex or gender (P21) female (Q6581072).
  3. Has given name Ada: given name (P735) Ada (Q346047).
  4. Has family name Byron: family name (P734) Byron (Q1260681).

Now that we have a SPARQL query that returns the entries we want, we can create the redlist using {{Wikidata list}} (and remembering to include a {{Wikidata list end}} template):

ListeriaBot will take care of updating it automatically, producing the following output:

Notice that the query returns only ?item. Columns in the table it generates are specified in the |columns= parameter of the {{Wikidata list}} template. See Template:Wikidata listfor more information on Wikidata list parameters.

Missing articles

[edit]

In order to list only items without a corresponding article in the English Wikipedia, every redlist needs the following SPARQL fragment:

OPTIONAL { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. }
FILTER(!(BOUND(?w)))

You will also see the following equivalent form:

FILTER NOT EXISTS { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/> . }

Number of sites

[edit]

When looking for notable subjects, it is often useful to look at how many Wikimedia projects have a page for a given item. This number can be retrieved with the following SPARQL fragment:

?item wikibase:sitelinks ?linkcount .

Here's a modified version of the simple example modified to add a column with link count:

Handling large results

[edit]

The number of results for a SPARQL query can often be in the thousands or tens of thousands. That is way beyond what we can handle in a wiki redlist, so we need to cut it own. The number of results of a query can be limited by adding a LIMIT clause to the end. For example, LIMIT 1000 to limit results to 1000.

However, if we use LIMIT alone, the results that make it into the list will be arbitrary, and they might not be the most relevant. So it is a good idea to always apply order criteria. A limit with our recommended order follows:

ORDER BY DESC(?linkcount) ASC(?item)
LIMIT 1000

This limits the results to the top 1000 by number of sites. If two items have the same number of sites, the one with the lowest item number takes precedence. This makes the result deterministic, meaning that in the absence of actual data changes, the query will always return the same set of 1000 results. If we didn't do this, the bot will repeatedly remove and add back items in subsequent updates.

Occupation

[edit]

One of the most common criterion for redlist is occupation (P106). Check out current redlists by occupation. We specify one or more occupations as follows:

?item wdt:P106 ?occ
VALUES ?occ {
  wd:Q5468707  # forensic entomologist
  wd:Q27645949 # paleoentomologist
  wd:Q3055126  # entomologist 
}

This will include items where occupation (P106) is either forensic entomologist (Q5468707), paleoentomologist (Q27645949), or entomologist (Q3055126). The comments in the query (e.g. # entomologist) are optional, but they can make the query more readable to humans.

Here's a full example of a redlist of 5 entomologist women (see also the actual Entomologists redlist):

Country

[edit]

See our country redlists. A simple approach to create this would be using the country of citizenship (P27) property. But Wikidata may be missing the country of citizenship, but it may have other geographical properties that would be good enough for our purposes. So we can use a combination of country of citizenship (P27), country (P17), country of origin (P495), country for sport (P1532), and place of birth (P19). We can do it with the following SPARQL fragment:

VALUES ?country {
  wd:Q189 # Iceland
}
{
  { ?item (wdt:P27|wdt:P17|wdt:P495|wdt:P1532) ?country. }
  UNION
  { ?item (wdt:P19/wdt:P17) ?country. }
}

Here's a full example of a redlist of 5 women from Honduras (see also the actual Honduras redlist):

Troubleshooting

[edit]

Killed by OS for overloading memory

[edit]

A list may fail to update because the bot ran out of memory. This is signaled with the error Killed by OS for overloading memory on manual updated. This problem is a known problem of ListeriaBot, and it is usually because there are many links to large entities. A workaround is reducing the number of links to geographical entitites. For example, removing the place of death (P20) column.