Wikipedia:Wikipedia Signpost/2020-04-26/In focus

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Multilingual Wikipedia: How to better integrate articles across language editions.
Denny Vrandečić was the Wikidata director until September 2013 and was a member of the Wikimedia Foundation board of trustees from July 2015 to April 2016. He earned a PhD at the Karlsruhe Institute of Technology. He now works at Google. -S

Wikipedia’s mission is to allow everyone to share in the sum of all knowledge. Wikipedia is in its twentieth year, and it has been a success in many ways. And yet, it still has large knowledge gaps, particularly in language editions with smaller active communities. But not only there – did you know that only a third of all topics that have Wikipedia articles have an article on the English Wikipedia? Did you know that only about half of articles in the German Wikipedia have a counterpart on the English Wikipedia? There are huge amounts of knowledge out there that are not accessible to readers who can read only one or two languages.

And even if there is an article, content is often very unevenly distributed, and where one Wikipedia has long articles with several sections, another Wikipedia might just have a stub. And sometimes, articles contain very outdated knowledge. When London Breed became mayor of San Francisco, nine months later only twenty-four language editions had listed her as such. Sixty-two editions listed out-of-date mayors – and not only Ed Lee, who was mayor from 2011, but also Gavin Newsom, who was mayor from 2004 to 2011, and Willie Brown, who was mayor from 1996 to 2004. The Cebuano Wikipedia even lists Dianne Feinstein, who was mayor from 1978 to 1988, more than a decade before Wikipedia was even created.

This is no surprise, as half of the Wikipedia language editions have fewer than ten active contributors. It is challenging to write and maintain a comprehensive and current encyclopedia with ten people in their spare time. It cannot be expected that those ten contributors keep track of all the cities in the world and update their mayors in Wikipedia. In many cases those contributors would prefer to work on other articles.

Wikidata to the rescue?

This is where Wikidata can help. And in fact, it does: of the twenty-four Wikipedia language editions that listed London Breed as mayor, eight got that information from Wikidata, and were up-to-date because of that. But Wikidata cannot really tell the full story.

Ed Lee, then mayor of San Francisco, died of cardiac arrest in December 2017. London Breed, as the president of the board of supervisors, became acting mayor, but in order to deny her the advantage of the incumbent, the board voted in January 2018 to replace her with Mark Farrell as interim mayor until the special elections to finish the term of Ed Lee were held in June. London Breed won the election and became mayor in July until the next regular elections a year later which she also won.

Now there are many facts in there that can be represented in Wikidata: that there was a special election for the position of the mayor of San Francisco, that it was held in June, that London Breed won that election. That there was an election in 2019. That Mark Farrell held the office from January to July. That Ed Lee died of cardiac arrest in December 2017.

But all of these facts don’t tell a story. Whereas Wikidata records these facts, they are spread throughout the wiki, and it is very hard to string them together in a way that allows a reader to make sense. Even worse, these facts are just a very small set of the billions of such facts in Wikidata, and for a reader it is hard to figure out which are relevant and which are not. Wikidata is great for answering questions, creating graphs, allowing data exploration, or making infobox-like overviews of a topic, but it is really bad at telling even the rather simple story presented above.

We have a solution for this problem, and it’s quite marvelous: language. Language is expressive, it can tell stories, it is predestined for knowledge transfer. But also, there are many languages in the world, and most of us only speak a few of them. This is a barrier for the transfer of knowledge. Here I suggest an architecture to lower this barrier, deeply inspired by the way language works.

Imagine for a moment that we start abstracting the content of a text. Instead of saying "in order to deny her the advantage of the incumbent, the board votes in January 2018 to replace her with Mark Farrell as interim mayor until the special elections", imagine we say something more abstract such as elect(elector: Board of Supervisors, electee: Mark Farrell, position: Mayor of San Francisco, reason: deny(advantage of incumbency, London Breed)) – and even more, all of these would be language-independent identifiers, so that thing would actually look more like Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)). On first glance, this looks much like a statement in Wikidata, but merely by putting that in a series of other such abstract statements, and having some connecting tissue between these bare-bones statements, we are inching much closer to what a full-bodied text needs.

A new project: a wiki for functions

But obviously, we wouldn’t show this abstract content to the readers. We still need to translate the abstract content to natural language. So we would need to know that the elect constructor mentioned above takes the three parameters in the example, and that we need to make a template such as {elector} elected {electee} to {position} in order to {reason} (something that looks much easier in this example than it is for most other cases). And since the creation of such translators has to be made for every supported language, we need to have a place to create such translators so that a community can do it.

For this I propose a new Wikimedia project, preliminarily called Wikilambda (and I am terrible with names, so I do not expect the project to be actually called this). Wikilambda would be a new project to create, maintain, manage, catalog, and evaluate a new form of knowledge assets: functions. Functions are algorithms, pieces of code, that translate input into output in a determined and repeatable way. A simple function, such as the square function, could take the number 5 and return 25. The length function could take a string such as "Wikilambda" and return the number 10. Another function could translate a date in the Gregorian calendar to a date in the Julian calendar. And yet another could translate inches to centimeters. Finally, one other function, more complex than any of those examples, could take an abstract content such as Q40231(Q3658756, Q6767574, Q1343202(Q6015536, Q6669880)) and a language code, and give back the text "In order to deny London Breed the incumbency advantage, the Board of Supervisors elected Mark Farrell Mayor of San Francisco." Or, for German, "Um London Breed den Vorteil des Amtsträgers zu verweigern, wählte der Stadtrat Mark Farrell zum Bürgermeister von San Francisco."

Wikilambda will allow contributors to create and maintain functions, their implementations and tests, in a collaborative way. These include the available constructors used to create the abstract content. The functions can be used in a variety of ways: users can call them from the Web, but also from local machines or from an app. By allowing the functions in Wikilambda to be called from wikitext, we also allow to create a global space to maintain global templates and modules, another long-lasting wish by the Wikimedia communities. This will allow more communities to share expertise and make the life of other projects such as the Content Translation tool easier.

This will allow the individual language communities to use text generated from the abstract content, and fill some of their knowledge gaps. The hope is that writing the functions that translate abstract content, albeit more complex, is also much less work than writing and maintaining a full-fledged encyclopedia. This will also allow smaller communities to focus on the topics they care about – local places, culture, food – and yet to have an up-to-date coverage of globally relevant topics.

What do you think?

To make it absolutely clear: this proposal does not call for the replacement of the current Wikipedias. It is meant as an offer to the communities to fill in the gaps that currently exist. It would be presumptuous to assume that a text generated by Wikilambda would ever achieve the brilliance and subtlety that let many of our current Wikipedia articles shine. And although there are several advantages for many parts of the English Wikipedia as well (say for global templates or content that is actually richer in a local language), I would be surprised if the English Wikipedia community would start to widely adopt what Wikilambda offers early on. But it seems that it is hard to overestimate the effect this proposal could have on smaller communities, and eventually on our whole movement in order to get a bit closer to our vision of a world in which everyone can share in the sum of all knowledge.

I invite you to read my recently published paper detailing the technical aspects and an upcoming chapter discussing the social aspects of this proposal. I have discussed this proposal with several researchers in many related research areas, with members of different Wikimedia communities, and with folks at the Wikimedia Foundation, to figure out the next steps. I also invite you to discuss this proposal, in the Comments section below, or on Meta, on Wikimedia-l, or with me directly. I am very excited to work toward it and I hope to hear your reservations and your ideas.

Update (May 8, 2020): An official proposal for Wikilambda is now up on Meta. Discussion and support can be expressed there.