Wikipedia:Wikipedia Signpost/2018-08-30/In the media

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Quicksilver AI writes articles: But unfortunately its output is incompatible with open licensing.

Does AI level the playing field for underrepresented subjects? Or perpetuate systemic bias?

Official logo of WikiProject Women in Red, depicting a left-facing white silhouette of a woman in the right half of a red heart
The logo of Women in Red, whose WikiProject has been heavily involved in creating articles based on the technology

Wired, Popular Science, The Verge, and others published a story on Quicksilver, a new artificial intelligence tool that finds missing Wikipedia articles, and writes short summaries. Users can head to Quicksilver's website to find a list of the 100 released notables.

In a blog post, the people behind the technology described how it works:

[The software can] read 500 million news articles, 39 million scientific papers, all of Wikipedia, and then write 70,000 biographical summaries of scientists. ... We are publicly releasing free-licensed data about scientists that we’ve been generating along the way, starting with 30,000 computer scientists. Only 15% of them are known to Wikipedia. The data set includes 1 million news sentences that quote or describe the scientists, metadata for the source articles, a mapping to their published work in the Semantic Scholar Open Research Corpus, and mappings to their Wikipedia and Wikidata entries.

The technology can also be used to help prevent Wikipedia articles from going "stale" and lagging behind the pace of events. In February 2018, Google announced that it was embarking on a similar project, but the passages were described by The Register as "a bit difficult to read without clear capital letters at the start of new sentences, and most sentences have the same rigid structure", and the model was criticized for reliability issues. Even Quicksilver only presents short clippings from news articles strung together, and presents a large focus on those who have the most mentions in the news, but it is a good place to start.

Bias in the big-data sources selected to fire up the AI has been pointed out as a potential downfall. Haaretz published a story titled "The Real Reason Sheldon Adelson's Wife Deserves a Wikipedia Page" about Miriam Adelson, who was listed in the original 100 figures, including this observation:

The initial data fed into the program was that of academics from the world of computer science, skewing the results in favor of that field from the outset. More so, a large number of those Quicksilver proposed for articles were American figures from the world of IT, suggesting that the initial dataset provided by the San Francisco-based company reflected its own location as much as their own backgrounds as engineers.

The sentiment was echoed in a "reflection" on-Wiki (permanent link), including this comment from Xcia0069: "Many of the sources are cheap news sites that aren't the most reliable interpretations of the research undertaken [and] a surprisingly high majority of the sampled of 100 scientists are from the USA".

The worries might be moot for us, though, if the output is incompatible with open licensing. The licensing of the work states that: "The data contains sentences from news articles provided for the purpose of computational research and development. Copyright law applies to the text of these sentences which may limit its use."

In brief

Photograph of Sarah Jeong speaking at the XOXO Festival in 2016
Sarah Jeong


Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next week's edition in the Newsroom or leave a tip on the suggestions page.