Jump to content

Wikipedia:Wikipedia Signpost/2016-09-06/Recent research: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
fix template, mention 2014 paper and list items
fix text added by IP, per instruction from author of article
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{Use dmy dates|date=September 2016}}
<noinclude>
<noinclude>
{{Signpost draft
{{Signpost draft
Line 19: Line 20:


===AI-generated Wikipedia articles give rise to debate about research ethics===
===AI-generated Wikipedia articles give rise to debate about research ethics===
At the [[International Joint Conference on Artificial Intelligence]] (IJCAI) – one of, if not the prime AI conferences – Banerjee and Mitra from [[Pennsylvania State University|Penn State]] published the paper "WikiWrite: Generating Wikipedia Articles Automatically".<ref>Siddhartha Banerjee, Prasenjit Mitra, [http://www.ijcai.org/Abstract/16/389 "WikiWrite: Generating Wikipedia Articles Automatically"].</ref>
At the [[International Joint Conference on Artificial Intelligence]] (IJCAI) – one of the prime AI conferences, if not the pre-eminent one – Banerjee and Mitra from [[Pennsylvania State University|Penn State]] published the paper "WikiWrite: Generating Wikipedia Articles Automatically".<ref>Siddhartha Banerjee, Prasenjit Mitra, [http://www.ijcai.org/Abstract/16/389 "WikiWrite: Generating Wikipedia Articles Automatically"].</ref>


The system described in the paper looks for [[WP:Red link|red links]] in Wikipedia and then classifies them based on their context. It then looks for similar articles that already exist in order to find section titles. With these titles, the system searches the web for information, and eventually uses content summarization and a paraphrasing algorithm. The researchers uploaded 50 of these automatically created articles to Wikipedia, and found that 47 of them indeed survived. Some were heavily edited after upload, others not so much.
The system described in the paper looks for [[WP:Red link|red links]] in Wikipedia and classifies them based on their context. To find section titles, it then looks for similar existing articles. With these titles, the system searches the web for information, and eventually uses content summarization and a paraphrasing algorithm. The researchers uploaded 50 of these automatically created articles to Wikipedia, and found that 47 of them survived. Some were heavily edited after upload, others not so much.


While the reviewer was enthusiastic about the results. I was surprised by the subpar quality of the articles I reviewed -- three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the [https://lists.wikimedia.org/pipermail/wiki-research-l/2016-August/005324.html Wiki Research list]. This was followed by an entry on the [[Wikipedia:Administrators' noticeboard/IncidentArchive931#Moving discussion from wikimedia research mailing list|English Wikipedia Administrators' noticeboard]] (which includes a list of all accounts used for this particular research paper). The discussion led to almost all remaining articles being removed except for a very few articles that had been improved over time by other editors.
While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki research list. This was followed by an entry on the [[Wikipedia:Administrators' noticeboard/IncidentArchive931#Moving discussion from wikimedia research mailing list|English Wikipedia administrators' noticeboard]] (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.


The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed unawareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community, which is certainly a positive outcome.
The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community a positive outcome.


In general, this sets an example for engineers and computer science engineers who still often show a lack of awareness of certain ethical issues in their research. Computer scientists often are trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it is social networks experimenting with the mood of their users, the current discussions of biases in machine-learned models, or the experimental and not-community-approved upload of automatically created content in Wikipedia, computer science in general has not reached the level of awareness some other sciences have for the possible effects of their research on human subjects, at least as far as this reviewer can tell.
In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it's social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects.


Even in Wikipedia, in the end there was no clear-cut, succinct Wikipedia policy that the researchers could be pointed to. The use of sockpuppets was a clear violation of policy, but an incidental side-effect to the research. [[WP:POINT]] was a stretch to cover the situation at hand. In the end, what we can suggest to researchers would be to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help avoiding uncomfortable situations.
Even in Wikipedia, there's no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets an incidental side-effect of the research was a clear violation of policy. [[WP:POINT]] was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.


''See also our 2015 review of a related paper coauthored by the same authors: "[[m:Research:Newsletter/2015/January#Bot_detects_theatre_play_scripts_on_the_web_and_writes_Wikipedia_articles_about_them|Bot detects theatre play scripts on the web and writes Wikipedia articles about them]]" and other similarly themed papers they have published since then: "WikiKreator: Automatic Authoring of Wikipedia Content"<ref>{{Cite journal| doi = 10.1145/2813536.2813538| issn = 2372-3483| volume = 2| issue = 1| pages = 4-6| last1 = Banerjee| first1 = Siddhartha| last2 = Mitra| first2 = Prasenjit| title = WikiKreator: Automatic Authoring of Wikipedia Content| journal = AI Matters| date = October 2015| url = http://doi.acm.org/10.1145/2813536.2813538}} {{closed access}}</ref>, "WikiKreator: Improving Wikipedia Stubs Automatically"<ref>Banerjee, Siddhartha and Mitra, Prasenjit: [http://www.aclweb.org/anthology/P15-1084 "WikiKreator: Improving Wikipedia Stubs Automatically, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing]" (Volume 1: Long Papers), July 2015, Beijing, China, Association for Computational Linguistics, pages 867-877, </ref>, "Filling the Gaps: Improving Wikipedia Stubs"<ref>{{Cite conference| publisher = ACM| doi = 10.1145/2682571.2797073| isbn = 9781450333078| pages = 117-120| last1 = Banerjee| first1 = Siddhartha| last2 = Mitra| first2 = Prasenjit| title = Filling the Gaps: Improving Wikipedia Stubs| booktitle = Proceedings of the 2015 ACM Symposium on Document Engineering| location = New York, NY, USA| series = DocEng '15| date = 2015| url = http://doi.acm.org/10.1145/2682571.2797073}} {{closed access}}</ref>.'' <small>[[User:Denny|DV]]</small>
''See also our 2015 review of a related paper coauthored by the same authors: "[[m:Research:Newsletter/2015/January#Bot_detects_theatre_play_scripts_on_the_web_and_writes_Wikipedia_articles_about_them|Bot detects theatre play scripts on the web and writes Wikipedia articles about them]]" and other similarly themed papers they have published since then: "WikiKreator: Automatic Authoring of Wikipedia Content"<ref>{{Cite journal| doi = 10.1145/2813536.2813538| issn = 2372-3483| volume = 2| issue = 1| pages = 4–6| last1 = Banerjee| first1 = Siddhartha| last2 = Mitra| first2 = Prasenjit| title = WikiKreator: Automatic Authoring of Wikipedia Content| journal = AI Matters| date = October 2015| url = http://doi.acm.org/10.1145/2813536.2813538}} {{closed access}}</ref>, "WikiKreator: Improving Wikipedia Stubs Automatically"<ref>Banerjee, Siddhartha and Mitra, Prasenjit: [http://www.aclweb.org/anthology/P15-1084 "WikiKreator: Improving Wikipedia Stubs Automatically, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing]" (Volume 1: Long Papers), July 2015, Beijing, China, Association for Computational Linguistics, pages 867–877,</ref>, "Filling the Gaps: Improving Wikipedia Stubs"<ref>{{Cite conference| publisher = ACM| doi = 10.1145/2682571.2797073| isbn = 9781450333078| pages = 117–120| last1 = Banerjee| first1 = Siddhartha| last2 = Mitra| first2 = Prasenjit| title = Filling the Gaps: Improving Wikipedia Stubs| booktitle = Proceedings of the 2015 ACM Symposium on Document Engineering| location = New York, NY, USA| series = DocEng '15| date = 2015| url = http://doi.acm.org/10.1145/2682571.2797073}} {{closed access}}</ref>.'' <small>[[User:Denny|DV]]</small>


===Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously===
===Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously===
A paper<ref>{{Cite journal| doi = 10.1007/s10676-016-9399-8| issn = 1388-1957, 1572-8439| pages = 1-18| last = Laat| first = Paul B.| title = Profiling vandalism in Wikipedia: A Schauerian approach to justification| journal = Ethics and Information Technology| date = 2016-04-30| url = http://link.springer.com/article/10.1007/s10676-016-9399-8}}</ref> in the journal ''Ethics and Information Technology'' examines the "system of surveillance" that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar [[Frederick Schauer]] that focuses on the concepts of rule enforcement and profiling. While providing justification of the system's efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in e.g. law enforcement, the paper ultimately argues that in its current form, it violates an alleged "social contract" on Wikipedia by not treating anonymous and logged-in edits equally. While generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing discussion about very much the same topic - potential biases of artificial intelligence-based anti-vandalsm tools against anonymous edits - that was begun last year by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users [ADD LINK TO THIS ISSUE'S TECHNOLOGY REPORT]) and most recently presented in the [[mw:Wikimedia_Research/Showcase#August_2016|August 2016 WMF research showcase]].
A paper<ref>{{Cite journal| doi = 10.1007/s10676-016-9399-8| issn = 1388-1957, 1572–8439| pages = 1–18| last = Laat| first = Paul B.| title = Profiling vandalism in Wikipedia: A Schauerian approach to justification| journal = Ethics and Information Technology| date = 30 April 2016| url = http://link.springer.com/article/10.1007/s10676-016-9399-8}}</ref> in the journal ''Ethics and Information Technology'' examines the "system of surveillance" that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Gröningen, presents an interesting application of a theoretical framework by US law scholar [[Frederick Schauer]] that focuses on the concepts of rule enforcement and profiling. While providing justification for the system's efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in e.g. law enforcement, the paper ultimately argues that in its current form, it violates an alleged "social contract" on Wikipedia by not treating anonymous and logged-in edits equally. While generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing discussion about very much the same topic potential biases of artificial intelligence-based anti-vandalsm tools against anonymous edits that was begun last year by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users (see this week's [[Wikipedia:Wikipedia_Signpost/Next_issue/Technology_report|Technology report]]) and most recently presented in the [[mw:Wikimedia_Research/Showcase#August_2016|August 2016 WMF research showcase]].


The paper first gives an overview over the various [[Wikipedia:Cleaning up vandalism/Tools|anti-vandalism tools]] and bots in use, recapping an earlier paper<ref>{{Cite journal| doi = 10.1007/s10676-015-9366-9| issn = 1388-1957, 1572-8439| volume = 17| issue = 3| pages = 175–188| last = Laat| first = Paul B. de| title = The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?| journal = Ethics and Information Technology| date = 2015-09-02| url = http://link.springer.com/article/10.1007/s10676-015-9366-9}}</ref> where de Laat had already asked whether these are "eroding Wikipedia’s moral order" (following an even earlier [[m:Research:Newsletter/2014/May#cite_ref-14|2014 paper]] where he had argued that new edit patrolling "raises a number of moral questions that need to be answered urgently"). There, de Laat's concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are only available to trusted users, that they "cause a loss of the required moral skills in relation to newcomers", and a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of "profiling" to pre-select which recent edits will be subject to closer human review. The author takes care to emphasize that on Wikipedia this usually does not mean person-based [[offender profiling]] (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper - cf. our review: "[[m:Research:Newsletter/2015/August#Early_warning_system_identifies_likely_vandals_based_on_their_editing_behavior|Early warning system identifies likely vandals based on their editing behavior]]". Rather, "the anti-vandalism tools exemplify the broader type of profiling" that focuses on actions. Based on Schauer's work, the author asks the following questions:
The paper first gives an overview of the various [[Wikipedia:Cleaning up vandalism/Tools|anti-vandalism tools]] and bots in use, recapping an earlier paper<ref>{{Cite journal| doi = 10.1007/s10676-015-9366-9| issn = 1388-1957, 1572–8439| volume = 17| issue = 3| pages = 175–188| last = Laat| first = Paul B. de| title = The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?| journal = Ethics and Information Technology| date = 2 September 2015| url = http://link.springer.com/article/10.1007/s10676-015-9366-9}}</ref> where de Laat had already asked whether these are "eroding Wikipedia’s moral order" (following an even earlier [[m:Research:Newsletter/2014/May#cite_ref-14|2014 paper]] in which he argued that new-edit patrolling "raises a number of moral questions that need to be answered urgently"). There, de Laat's concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are available only to trusted users and "cause a loss of the required moral skills in relation to newcomers", and that they a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of "profiling" to pre-select which recent edits will be subject to closer human review. The author emphasizes that on Wikipedia this usually does not mean person-based [[offender profiling]] (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper cf. our review: "[[m:Research:Newsletter/2015/August#Early_warning_system_identifies_likely_vandals_based_on_their_editing_behavior|Early warning system identifies likely vandals based on their editing behavior]]". Rather, "the anti-vandalism tools exemplify the broader type of profiling" that focuses on actions. Based on Schauer's work, the author asks the following questions:
# "Is this profiling profitable, does it bring the rewards that are usually associated with it?"
# "Is this profiling profitable, does it bring the rewards that are usually associated with it?"
# "... is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?"
# "is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?"


To answer the first question, the author turns to Schauer's work on rules, summarizing it in a short [https://link.springer.com/article/10.1007/s10676-016-9399-8#Sec5 section] that is worth reading for anyone interested in Wikipedia policies and guidelines in general; although de Laat instead applies the concept to the "procedural rules" implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing).
To answer the first question, the author turns to Schauer's work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines although de Laat instead applies the concept to the "procedural rules" implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing).
First of all, Schauer "resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis ". (For example, a restaurant's "No Dogs Allowed" rule will treat some dogs unfairly that are so well-behaved that their presence won't create problems, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:
First, Schauer "resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis ". (For example, a restaurant's "No Dogs Allowed" rule will treat some dogs unfairly that are so well-behaved that their presence won't create problems, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:
*Rules "create ''reliability/predictability'' for those affected by the rule: rule-followers as well as rule-enforcers".
*Rules "create ''reliability/predictability'' for those affected by the rule: rule-followers as well as rule-enforcers".
*Rules "promote more efficient use of resources by rule-enforcers" (e.g. in case of a speeding car driver, traffic police and judges can apply a simple speed limit instead having to prove in detail that dangerous driving happened).
*Rules "promote more efficient use of resources by rule-enforcers" (e.g. in case of a speeding car driver, traffic police and judges can apply a simple speed limit instead having to prove in detail that an instance of driving was dangerous).
*Rules, if they are simple enough, reduce the problem of "risk-aversion" by enforcers, who are much more likely to make mistakes and face repercussions if they have to make case by case decisions.
*Rules, if simple enough, reduce the problem of "risk-aversion" by enforcers, who are much more likely to make mistakes and face repercussions if they have to make case by case decisions.
*Rules create stability, which however also presents "an impediment to change; it entrenches the status-quo. If change is on a society’s agenda, the stability argument turns into an argument against having (simple) rules."
*Rules create stability, but also present "an impediment to change; they entrench the status-quo. If change is on a society’s agenda, the stability argument turns into an argument against having (simple) rules."


The author cautions that these arguments have to be reinterpreted when applying them to the aforementioned vandalism profiling, because it consists of "procedural rules" (which edits should be selected for inspection) rather than "substantive rules" (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant, etc.). Whereas in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because "we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in". With that qualification, Schauer's second argument provides justification for "Wikipedian profiling [because it] turns out to be amazingly effective", starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false positive rate.
The author cautions that these arguments have to be reinterpreted when applying them to the aforementioned vandalism profiling, because it consists of "procedural rules" (which edits should be selected for inspection) rather than "substantive rules" (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because "we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in". With that qualification, Schauer's second argument provides justification for "Wikipedian profiling [because it] turns out to be amazingly effective", starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.
De Laat also interprets "the Schauerian argument of ''reliability/predictability'' for those affected by the rule" in favor of vandalism profiling. Here though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that "it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users". The remaining two of Schauer's four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer's theoretical framework.
De Laat also interprets "the Schauerian argument of ''reliability/predictability'' for those affected by the rule" in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that "it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users". The remaining two of Schauer's four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer's theoretical framework.


Next, de Laat turns to question 2., on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book titled "Profiles, Probabilities, and Stereotypes" that studies e.g. profiling as done by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and police officers (e.g. selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see e.g. [[Driving While Black]]). For de Laat's study of Wikipedia profiling, "two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue)." Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. E.g. while Schauer considers that it may be justified for "airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance", it would be an over-use if "officials ask ''all'' Muslim men and ''all'' men of Middle Eastern origin to step out of line to be searched", thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive - indeed, for the specific case of luggage screening in the US, "the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling" since 1997.
Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, "Profiles, Probabilities, and Stereotypes", that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and decision-making by police officers (e.g. selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see e.g. [[Driving While Black]]). For de Laat's study of Wikipedia profiling, "two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue)." Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. E.g. while Schauer considers that it may be justified for "airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance", it would be an over-use if "officials ask ''all'' Muslim men and ''all'' men of Middle Eastern origin to step out of line to be searched", thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive indeed, for the specific case of luggage screening in the US, "the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling" since 1997.


Applying this to the case of Wikipedia vandalism fighting, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG - obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, "I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools." This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but also expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. Howver there does not seem to have been an attempt to study empirically whether this overuse actually occurs.)
Applying this to the case of Wikipedia's anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, "I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools." This is because tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seem to have been no attempt to study empirically whether this overuse actually occurs.)


Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the USA, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern "strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy", chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.
Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern "strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy", chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.


In contrast, de Laat argues that "the targeting of contributors who choose to remain ''anonymous'' [...] is fraught with danger since anons already constitute a controversial group within the Wikipedian community." Still, he acknowledges the "undisputed fact" that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives: "normally [IP editors] do not experience any harm when their edits are selected and inspected as a result of anon-powered profiling; they will not even notice that they were surveilled since no digital traces remain of the patrolling. [...] The only imaginable harm is that patrollers become over focussed on anons and indulge in what I called above ‘overinspection’ of such edits and wrongly classify them as vandalism [...] As a consequence, they might never contribute to Wikipedia again. [...] Nevertheless, I estimate this harm to be small. At any rate, the harm involved would seem to be small in comparison with the harassment of racial profiling—let alone that an ‘expressive harm hypothesis’ applies."
In contrast, de Laat argues that "the targeting of contributors who choose to remain ''anonymous'' ... is fraught with danger since anons already constitute a controversial group within the Wikipedian community." Still, he acknowledges the "undisputed fact" that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:
{{Signpost quote|normally [IP editors] do not experience any harm when their edits are selected and inspected as a result of anon-powered profiling; they will not even notice that they were surveilled since no digital traces remain of the patrolling. ... The only imaginable harm is that patrollers become over focussed on anons and indulge in what I called above 'overinspection' of such edits and wrongly classify them as vandalism ... As a consequence, they might never contribute to Wikipedia again. ... Nevertheless, I estimate this harm to be small. At any rate, the harm involved would seem to be small in comparison with the harassment of racial profiling—let alone that an 'expressive harm hypothesis' applies.}}


With this said, de Laat still makes the controversial call "that the anonymous-dimension should be banned from all profiling efforts" - including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,
With this said, de Laat still makes the controversial call "that the anonymous-dimension should be banned from all profiling efforts" including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,
:''"my main argument for the ban is a decidedly moral one. From the very beginning the Wikipedian community has operated on the basis of a ‘social contract’ that makes no distinction between anons and non-anons—all are citizens of equal stature. [...] In sum, the express profiling of anons turns the anonymity dimension from an access condition into a social distinction; the Wikipedian community should refrain from institutionalizing such a line of division. Notice that I argue, in effect, that the Wikipedian community has only two choices: either accept anons as full citizens or not; but there is no morally defensible social contract in between."''
:''"my main argument for the ban is a decidedly moral one. From the very beginning the Wikipedian community has operated on the basis of a 'social contract' that makes no distinction between anons and non-anons—all are citizens of equal stature. ... In sum, the express profiling of anons turns the anonymity dimension from an access condition into a social distinction; the Wikipedian community should refrain from institutionalizing such a line of division. Notice that I argue, in effect, that the Wikipedian community has only two choices: either accept anons as full citizens or not; but there is no morally defensible social contract in between."''
Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged contract. While it is true that "the ability of almost anyone to edit (most) articles without registration" forms part of Wikipedia's [[m:founding principles|founding principles]] (a principle that this reviewer strongly agrees with), the "equal stature" part seems to be de Laat's own invention - there is a [[Wikipedia:IPs_are_human_too#What_an_unregistered_user_can.27t_do_by_themselves_.28directly.29|long list]] of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions - say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes - seem much more serious than the vandalism profiling that is the topic of de Laat's paper.
While the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged contract. While it is true that "the ability of almost anyone to edit (most) articles without registration" forms part of Wikipedia's [[m:founding principles|founding principles]] (a principle that this reviewer strongly agrees with), the "equal stature" part seems to be de Laat's own invention there is a [[Wikipedia:IPs_are_human_too#What_an_unregistered_user_can.27t_do_by_themselves_.28directly.29|long list]] of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes seem much more serious than the vandalism profiling that is the topic of de Laat's paper.


===Briefly===
===Briefly===
Line 69: Line 71:
===Other recent publications===
===Other recent publications===
''A list of other recent publications that could not be covered in time for this issue—[[m:Research:Newsletter#How to contribute|contributions are always welcome]] for reviewing or summarizing newly published research.''
''A list of other recent publications that could not be covered in time for this issue—[[m:Research:Newsletter#How to contribute|contributions are always welcome]] for reviewing or summarizing newly published research.''
*'''"Large SMT Data-sets Extracted from Wikipedia"'''<ref>{{Cite conference| isbn = 978-2-9517408-8-4| conference = TUFI 14.103| last1 = Tufiş| first1 = Dan| last2 = Ion| first2 = Radu| last3 = Dumitrescu| first3 = Ştefan| last4 = Ştefănescu2| first4 = Dan| title = Large SMT Data-sets Extracted from Wikipedia| booktitle = Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)| date = 2014-05-26| url = http://www.lrec-conf.org/proceedings/lrec2014/pdf/103_Paper.pdf}}</ref> From the abstract: "The article presents experiments on mining Wikipedia for extracting SMT [ [[statistical machine translation]] ] useful sentence pairs in three language pairs. [...] The optimized SMT systems were evaluated on unseen test-sets also extracted from Wikipedia. As one of the main goals of our work was to help Wikipedia contributors to translate (with as little post editing as possible) new articles from major languages into less resourced languages and vice-versa, we call this type of translation experiments 'in-genre' translation. As in the case of 'in-domain' translation, our evaluations showed that using only 'in-genre' training data for translating same genre new texts is better than mixing the training data with 'out-of-genre' (even) parallel texts."
*'''"Large SMT Data-sets Extracted from Wikipedia"'''<ref>{{Cite conference| isbn = 978-2-9517408-8-4| conference = TUFI 14.103| last1 = Tufiş| first1 = Dan| last2 = Ion| first2 = Radu| last3 = Dumitrescu| first3 = Ştefan| last4 = Ştefănescu2| first4 = Dan| title = Large SMT Data-sets Extracted from Wikipedia| booktitle = Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)| date = 26 May 2014| url = http://www.lrec-conf.org/proceedings/lrec2014/pdf/103_Paper.pdf}}</ref> From the abstract: "The article presents experiments on mining Wikipedia for extracting SMT [ [[statistical machine translation]] ] useful sentence pairs in three language pairs. [...] The optimized SMT systems were evaluated on unseen test-sets also extracted from Wikipedia. As one of the main goals of our work was to help Wikipedia contributors to translate (with as little post editing as possible) new articles from major languages into less resourced languages and vice-versa, we call this type of translation experiments 'in-genre' translation. As in the case of 'in-domain' translation, our evaluations showed that using only 'in-genre' training data for translating same genre new texts is better than mixing the training data with 'out-of-genre' (even) parallel texts."
*'''"Recognizing Biographical Sections in Wikipedia"'''<ref>{{Cite conference| pages = 811-816| last1 = Aprosio| first1 = Alessio Palmero| last2 = Tonelli| first2 = Sara| title = Recognizing Biographical Sections in Wikipedia| booktitle = Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing| location = Lisbon, Portugal| date = 2015-09-17| url = http://www.aclweb.org/anthology/D15-1095}}</ref> From the abstract: "Thanks to its coverage and its availability in machine-readable format, [Wikipedia] has become a primary resource for large scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present [as opposed to nonbiographical sections, e.g. 'Early Life' but not 'Legacy' or 'Selected writings']."
*'''"Recognizing Biographical Sections in Wikipedia"'''<ref>{{Cite conference| pages = 811–816| last1 = Aprosio| first1 = Alessio Palmero| last2 = Tonelli| first2 = Sara| title = Recognizing Biographical Sections in Wikipedia| booktitle = Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing| location = Lisbon, Portugal| date = 17 September 2015| url = http://www.aclweb.org/anthology/D15-1095}}</ref> From the abstract: "Thanks to its coverage and its availability in machine-readable format, [Wikipedia] has become a primary resource for large scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present [as opposed to nonbiographical sections, e.g. 'Early Life' but not 'Legacy' or 'Selected writings']."
*'''"'A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce': Learning State Changing Verbs from Wikipedia Revision History."''' <ref>{{Cite conference| conference = Proceedings of EMNLP 2015.| pages = 518-523| last1 = Nakashole| first1 = Ndapa| last2 = Mitchell| first2 = Tom| last3 = Wijaya| first3 = Derry| title = "A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History.| location = Lisbon, Portugal| date = 2015| url = http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP059.pdf}}</ref> From the abstract: "We propose to learn state changing verbs [such as 'born', 'died', 'elected', 'married'] from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity’s Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. [...] We observe in our experiments that when state-changing verbs are added or deleted from an entity’s Wikipedia page text, we can predict the entity’s infobox updates with 88% precision and 76% recall." <small>[[User:Tbayer (WMF)|TB]]</small>
*'''"'A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce': Learning State Changing Verbs from Wikipedia Revision History."'''<ref>{{Cite conference| conference = Proceedings of EMNLP 2015.| pages = 518–523| last1 = Nakashole| first1 = Ndapa| last2 = Mitchell| first2 = Tom| last3 = Wijaya| first3 = Derry| title = "A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce": Learning State Changing Verbs from Wikipedia Revision History.| location = Lisbon, Portugal| date = 2015| url = http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP059.pdf}}</ref> From the abstract: "We propose to learn state changing verbs [such as 'born', 'died', 'elected', 'married'] from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity's Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. [...] We observe in our experiments that when state-changing verbs are added or deleted from an entity's Wikipedia page text, we can predict the entity's infobox updates with 88% precision and 76% recall." <small>[[User:Tbayer (WMF)|TB]]</small>
*'''"Extracting Representative Phrases from Wikipedia Article Sections"'''<ref>Shan Liu, Mizuho Iwaihara:
*'''"Extracting Representative Phrases from Wikipedia Article Sections"'''<ref>Shan Liu, Mizuho Iwaihara:
Extracting Representative Phrases from Wikipedia Article Sections, DEIM Forum 2016 C3-6. http://db-event.jpn.org/deim2016/papers/314.pdf</ref> From the abstract: "Since [Wikipedia's] long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to."
Extracting Representative Phrases from Wikipedia Article Sections, DEIM Forum 2016 C3-6. http://db-event.jpn.org/deim2016/papers/314.pdf</ref> From the abstract: "Since [Wikipedia's] long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to."

Revision as of 04:07, 5 September 2016

Recent research

AI-generated articles and research ethics; anonymous edits and vandalism fighting ethics

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.