Wikipedia:Wikipedia Signpost/2019-11-29/From the archives
- MER-C was interviewed by Mabeenot for The Signpost's WikiProject report originally published July 18, 2011. We invited them to revisit the report and comment on any changes that have happened since 2011. In 2014 MER-C was given the mop in a unanimous RfA. We will publish MER-C's reactions, followed by the original report. –B
Well, this interview aged quickly. So what has changed? What does spam look like nowadays on Wikipedia?
Firstly, I don't know if linkspam in all its forms has increased or not since them. It is no longer economical for me to spend time pursuing it.
The most obvious form of UPE involves the creation of articles that would not otherwise warrant inclusion. Long term contributors may remember when Wikipedia:Conflict of interest was titled Wikipedia:Vanity page. This is exactly the functionality these "articles" serve. Ghostwritten vanity pages are designed explicitly to show up on the first item and the sidebar of a Google search, but are difficult for Wikipedians to find and, if found, to evaluate the notability of their subject. Spam is less about Viagra or Cialis, and more about early-stage startups, businesspeople, motivational speakers, cryptocurrencies and so forth.
There are numerous companies that offer ghostwritten vanity pages for a small amount of money, typically a few hundred dollars. These companies employ freelancers in English speaking Third World countries who have very few opportunities for legitimate employment. In fact, similiar dishonest activities such as running a fake news website or writing for an essay mill turn out to be quite lucrative, in purchasing power parity terms, for the freelancers concerned.
The level of abuse is systematic, pervasive, and of increasing sophistication. The worst spammers have taken on characteristics of advanced persistent threats, including the use of compromised computers, VPNs and cloud computing infrastructure to post spam. There are no effective admin tools. Two new page patrollers, who screen newly created articles for notability and other problems, have been blocked for corruptly reviewing spam last week ( and ). It is only a matter of time before paid editors systematically infiltrate the admin corps.
Much of the increase in spamming is a consequence of Wikipedia's own success. However, a large portion of the blame lies squarely with the Wikimedia Foundation. The WMF places significant emphasis in materials targeted at donors on crude metrics of content quantity and community size simply because that is what the WMF thinks donors want to hear. The WMF therefore faces incentives very similar to Facebook and Google. Social media sites tolerate a high level of bots, Russian trolls and spammers because fake accounts pad their key metrics of monthly active users and ad impressions, giving the illusion of growth and making them look good in the eyes of their customers (advertisers) and investors. Similar emphasis is put by the WMF (and Facebook) on outreach efforts in the poor countries that are the source of much of the spam, despite multiple past high-profile failures, again because the WMF thinks donors want to see desperate, impoverished people in sub-Saharan Africa being helped. A few extra vanity pages and sockpuppets certainly help the WMF look good in their pitch to donors.
The WMF does not sufficiently care about our admin tools being fit for purpose. Like Facebook, Youtube and Google before recent scandals, investments in content moderation are seen as purely a cost while "initiatives" that provide feel-good anecdotes for donors or increase donor-targeted metrics and hence increase donations are heavily prioritized. The WMF deserves nothing but utter condemnation and scorn for the complete lack of maintenance, let alone investment, in the code underlying the administrator toolset. A seemingly simple task such as adding a checkbox to the delete form that deletes the associated talk page requires nothing less than a fundamental rewrite of the relevant code.
The fight against spam is nothing short of an existential battle against the degeneration of this encyclopedia into a large set of vanity pages about attention-seeking subjects. And we're losing.
- "Meeting Kosovo's clickbait merchants". BBC News. 10 November 2018. Retrieved 31 May 2019.
- "The Kenyan ghost writers doing 'lazy' Western students' work". BBC News. 22 October 2019. Retrieved 23 November 2019.
- "Wikimedia Foundation 2017-18 Annual Report". Wikimedia Foundation. Retrieved 23 November 2019.
- Wikipedia:India Education Program
- "Angola's Wikipedia Pirates Are Exposing the Problems With Digital Colonialism". Vice News. 23 March 2016. Retrieved 6 June 2019.
- Don't take my word for it.
- "Underpaid and overburdened: the life of a Facebook moderator". The Guardian. 27 May 2017. Retrieved 6 June 2019.
- "Christchurch shootings: Social media races to stop attack footage". BBC News. 16 March 2019. Retrieved 6 June 2019.
Original WikiProject report – Earn $$$ free pharm4cy WORK FROM HOME replica watches ViAgRa!!!
- By Mabeenot, 18 July 2011
This week, we spent some time with WikiProject Spam. The project describes itself as a "voluntary Spam-fighting brigade" which seeks to eliminate the three types of Wikispam: advertisements masquerading as articles, external link spam, and references that serve primarily to promote the author or the work being referenced. WikiProject Spam applies policies regarding what Wikipedia is not and guidelines for external links. The project received some help in February 2007 when the English Wikipedia tagged external links as "NOFOLLOW", preventing search engines from indexing external links and limiting the incentive for many spammers to use Wikipedia as a search engine optimization tool. The project maintains outreach strategies, detailed steps for identifying and removing spam, a variety of search tools, several bots for detecting spam, and a big red button to report spam and spammers. The project was started by Jdavidb in September 2005 and has grown to include 371 members. One of the project's most active members, MER-C, agreed to show us around.
How much time do you typically devote each week to fighting spam?
- I find the time commitment required for anti-spam work to be extremely variable. Monitoring the IRC feed isn't particularly taxing; and it isn't too difficult to clean up a few possible copyright problems, edit a few articles or perform non-WP related work or leisure concurrently.
- This is an illusion. 98% of those edits are from User:COIBot, a spam reporting bot. The remaining 2% are to the project's talk page, which serves as a noticeboard for reporting spam campaigns. A good chunk of the edits to the talk page are from a handful of anti-spam specialists. I can't explain the number of watchers though.
What type of wikispam do you come across most often? Do you use any special tools to detect spam or do you simply remove spam you notice while reading and editing articles?
- While reading articles and cleaning out the spam contained within haphazardly works, it doesn't address the cause of the problem. I target the spammers themselves, i.e. identifying domains owned by the spammer and systematically removing spammed links to said domains. To do it properly requires heavy use of tools beyond the usual contribution analysis:
- Special:Linksearch and its cross-wiki counterpart
- Cross-wiki contributions
- User:Versageek and User:Beetstra maintain a database of link additions to all Wikimedia projects. New links are reported to the IRC channel
wikipedia-en-spam(don't go there yet, it's not currently working) and others. User:XLinkBot, a spam reversion bot, and User:COIBot use this channel as their source of link additions. Reports are triggered when a small group of users are responsible for a large fraction of link additions to a particular site or can be requested through IRC or User:COIBot/Poke (administrators and trusted users only).
- Various external tools, including Whois, reverse DNS lookups, HTML analysis, Google AdSense and Google Analytics databases and a bit of Google-fu.
- The Firefox extensions NoScript and RequestPolicy to detect redirects to other domains and protect against the mystery meat nature of spammed sites.
- A text editor that has fuzzy find and replace functionality, usually implemented using regular expressions.
- I target external link additions, so I encounter vanilla external link spam most frequently. The most annoying and widespread spam campaigns, however, involve multiple spam tactics. That said, I've noticed the following recent spam trends -- note the tendency towards avoiding scrutiny from RC patrollers:
- The spreading of spam edits over multiple IP addresses and user accounts; one spam link per IP address/account isn't uncommon.
- Spam masquerading as citations. This typically involves the repeated addition of a certain "reference" by a given person, the spammy nature isn't apparent until you look at the big picture.
- Replacement of existing links and/or citations
- Inline spamming, the insertion of external links into article prose purely for search engine optimization
- Misleading edit summaries
Have you had any heated conversations with spammers after removing spam from an article? What are some strategies you've used to resolve these conflicts?
- Personal attacks, edit warring and vandalism are surefire ways to expedite blacklisting of the spammer's sites. A couple of months ago, I dealt with a spammer who edit warred to include links to his website. He responded by vandalising my userpage, and so the relevant sites were promptly blacklisted. Apart from a bad faith delisting request, we haven't heard from them since. This is typical; blacklisting is a very effective way of removing spammers from Wikipedia. (Unlike blocks, blacklisting requires money to evade—the spammer needs to purchase new Internet domains.)
Has your experience fighting spam resulted in any humorous stories? Have you heard any amusing excuses and special pleading from spammers trying to defend their edits?
- See Wikipedia:Grief for details on the usual routine of spammers.