Wikipedia:Requests for comment/Reliability of sources and spam blacklist

From Wikipedia, the free encyclopedia

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


RFC[edit]

Also advertised on WT:RS and WP:CENT Gigs (talk) 19:18, 10 December 2009 (UTC)[reply]
WP:RSN, WP:WPSPAM, m:WM:SBL and WP:AN have been notified also--Hu12 (talk) 20:39, 10 December 2009 (UTC)[reply]

Questions for community consideration[edit]

There are two primary, and related, questions that have been raised recently at discussions regarding the spam blacklist:

  • Should we blacklist sites that pay authors for contributions, based on traffic, such as ehow.com, examiner.com, triond.com and associatedcontent.com?
  • Should the reliability of a source be a factor in blacklisting or whitelisting decisions?

Background facts[edit]

  • The blacklist in question is a page in the MediaWiki namespace. Unlike the Meta blacklist, this blacklist affects pages on the English Wikipedia only.
  • Blacklisting prevents editors from saving a revision that adds a hyperlink or a reference to the blacklisted site.
  • Blacklisting is abuse/evidence-based and should not be used preemptively.
  • Recent practice has been to deny de-blacklisting based on the reliability or suitability of the source, as judged by the editor(s) who handle the request. Examples: 1,2
  • Recent practice has been to consider the reliability of a site as one of the factors in blacklisting decisions. Examples: 1,2
  • Blacklisting goes on abuse, and it may consider the reliability, while whitelisting is sometimes plainly denied because the document in question is not a reliable source.
  • The blacklist extension has been in use since July of 2007 on the English Wikipedia, recently there has been an ArbCom ruling:

As blacklisting is a method of last resort, methods including blocking, page protection, or the use of bots such as XLinkBot are to be used in preference to blacklisting. Blacklisting is not to be used to enforce content decisions.

— ArbCom, Passed 10 to 0, 16:39, 18 May 2009 (UTC)
  • It is general practice to consider the status of the account requesting de-blacklisting or whitelisting (a.o. avoiding conflict of interest issues)
  • It is general practice in declined de-blacklisting requests to ask editors to consider specific whitelisting of specific documents which are needed
  • It is general practice in whitelist requests to ask editors whether the linked documents are deemed necessary, and/or if there is consensus under editors that the specific document is needed
  • Until recently, XLinkBot could not revert WP:REFSPAM
  • XLinkBot can revert the addition of sites that are listed in it, however it will not re-revert if an editor adds them again or undos the bot's revert. It will also not revert established editors.

Typical examples which show the persistence of the problem[edit]

Here are some typical examples of the efforts spammers do to escape blacklisting etc. (I will look for examples which are related to the sites under scrutiny, but it shows how the incentive of being linked to can drive editors):

  • E.g. see User:Japanhero, who has made a large number of socks using an easy changing IP to spam a large number of subdomains on webs.com. This example shows the lack of sufficiency of specific blacklisting and blocking, while numerous (full) protections (accounts are quickly autoconfirmed!) to prevent the editor from adding the links to the pages would be needed. Blacklisting the whole of webs.com would solve the problem .. but the site contains too much good information.
  • MediaWiki_talk:Spam-blacklist#brownplanet.com. Besides spamming this domain, the spammer also linked to a large number of redirect sites (and seems to have quite an access to them). Again a clear example of how specific blacklisting is not sufficient.
  • An experiment of 'de-blacklisting' of a site is the homepage of the University of Atlanta. It was batch-blacklisted together with a set of domains which were under a SEO-push. The homepage of the University was not added yet at that point. Some time later, a new account requested de-blacklisting, upon which the link got de-blacklisted (I did not notice the exact why, and saw no evidence of spamming). That resulted in a significant, continuous push of several SPA-accounts whose only purpose was to advertise the diploma mill. Several blocks were given, editors and ranges were blocked out by the AbuseFilter, and in the end the domain was again blacklisted with specific whitelistings for 2 documents which were needed.
  • Another 'experiment' was the site aboutmyarea.co.uk. An advertising site for local communities, maintained by the community. The domain was spammed to many articles by a significant sockfarm. Many users got blocked but to no avail. XLinkBot did not result in a significant downfall of the spam (edits were simply undone), and the link got blacklisted. Some time later an editor informally requested de-blacklisting to another admin, and the domain got de-blacklisted (the editor needed the site to source an article). Shortly after the first socks appeared again and the domain was again being spammed. Several socks later the blacklisting was reinstated and the two specific sources whitelisted.
  • Showing that spammers don't care about XLinkBot, they just revert: Special:Contributions/81.101.45.83, over and over. They are not here to improve Wikipedia, there is no 'oh, I did not know that documents on this server may not be suitable for sourcing', earning money is the ultimate goal!
  • AbuseFilter 247 is set to catch email address additions to mainspace. Though people are warned, many persist in adding their email address. Those are generally the ones which are there for spam reasons (diff, diff, diff).
  • Example whitelisting request for an examiner.com link where the requesting editor fairly says "Of course I am trying to generate traffic, wouldn't you?".

Should we blacklist sites that pay authors for contributions, based on traffic, such as ehow.com, examiner.com, triond.com and associatedcontent.com?[edit]

Arguments for[edit]

The arguments for this are that such sites represent a unique and significant conflict of interest, which will motivate authors to spam these sites to an extent that low-threshold, or even preemptive blacklisting could be justified. "Low-threshold" here refers to blacklisting the entire site once abuse is detected of one article or subdomain. This because other subdomains or documents are easily made to circumvent specific blacklisting, blocking accounts or protecting pages is insufficient as the drive for earning money will result in editors. XLinkBot listings are unsuitable because the financial motivation to spam will override the small speedbump of being reverted once or establishing an account. When there are already several discussions on WP:RS / WP:RS/N on a site, which have as a general outcome that the site is generally unreliable, by several editors, then the decision not to use these links has already been made by editors, therefore it is still an editorial decision and not an administrative one.

Arguments against[edit]

The arguments against this are that these sites host a wide range of content from a wide range of sources, with varying reliability, and the decision whether or not to use such links and references should be an editorial one, not an administrative one. XLinkBot listings are appropriate as a reminder to unexperienced users that they should think twice before using such links, while still leaving the ultimate decision on suitability to the editor and article talk page consensus.

Discussion[edit]

  • The decision to use a site is still an editorial one, it is only hampered to be applied since first whitelisting has to be requested. When such sites get abused, blacklisting prevents further abuse, so yes, such sites should be blacklisted when they get removed. Remember, practically every domain has a proper use, even those of porn sites and certain drug companies. Nonetheless, such sites are also abused, many are also blacklisted. Spamming of these sites is exactly the same as spamming of those sites, and when consensus shows that editors need the site, parts of it can be whitelisted. --Dirk Beetstra T C 16:32, 10 December 2009 (UTC)[reply]
  • This needs to be handled on a case-by-case basis, and as with all blacklisting, as a last resort. I would recommend blacklisting only if other methods have been exhausted and the problem is still ongoing, with a review every few months to procedurally de-blacklist sites who haven't triggered the edit filter recently. I do recommend bot-reverting such edits in articles and having a link to the diff bot-posted to the talk page saying something like
davidwr/(talk)/(contribs)/(e-mail) 17:59, 10 December 2009 (UTC)[reply]
  • Re to davidwr: There is not an edit filter being triggered, we don't know if it is abused. And an edit filter for this would be too heavy on the servers. --Dirk Beetstra T C 18:02, 10 December 2009 (UTC)[reply]
  • Re2: reverting does not help. For real spammers XLinkBot is generally just a hurdle. The get reverted, warned, and they re-add the link. Don't forget, you earn money spamming these links. So the only way to stop it, is to blacklist it. Spammers use multiple domains (here you can use multiple documents, rename your document, or make a fresh account and start over), spammers use redirect sites (which upon detection go immediately on the meta blacklist, a en-only example is now on the blacklist requests, brownplanet). It is their job. Or you hire someone to do it for you. The bot-revert suggested above is nice for the good faith misplaced additions, or even for good faith good additions, but it does NOT help against those who link because they want to earn money. --Dirk Beetstra T C 18:33, 10 December 2009 (UTC)[reply]
  • A few well-placed WP:LEGAL actions in a real courtroom might do the trick, but that's a solution that is out of our hands. davidwr/(talk)/(contribs)/(e-mail) 19:25, 10 December 2009 (UTC)[reply]
  • Not sure what you want to take legal action against, who it actually is, and in which courtroom (which country) ... --Dirk Beetstra T C 20:03, 10 December 2009 (UTC)[reply]
  • For any person that is banned who happens to be in the United States, a cease-and-desist order followed by a suit is possible. However, it would take a lot of resources and likely not be cost-effective. Criminal charges for accessing a computer without permission are theoretically possible but I doubt a DA would go for it unless there was something in it for him politically. davidwr/(talk)/(contribs)/(e-mail) 20:34, 10 December 2009 (UTC)[reply]
  • I agree with the case-by-case strategy. I assume all of these sites have been abused in the past and as such support continued blacklisting. However, I do not support preemptive blacklisting. It would also be nice if entries could be removed as they are no longer needed, although I'm not sure how that would be determined. I believe every entry adds some overheard to every edit, so entries shouldn't be added without good reason. --ThaddeusB (talk) 00:45, 11 December 2009 (UTC)[reply]
  • I oppose blacklisting solely on the basis that the author is paid by usage. I agree that even decision to blacklist should be on a case-by-case basis and only when there is abuse. --Bejnar (talk) 03:10, 11 December 2009 (UTC)[reply]
  • The question "should we blacklist sites that pay authors" is misleading because it contains an implication that such sites are blacklisted simply because authors are paid (that is not correct; actual spamming is what leads to blacklisting). This RFC arose from a discussion (permanent link) where it was explained that links to examiner.com have been spammed and the site was blacklisted. After a suggestion to un-blacklist examiner.com, the point was made that the site fails WP:RS because essentially anyone can post an article (the site has 12,000 contributors), and there is no point removing the blacklist to see if the site will be spammed because we know that it will be since the contributors are "paid a very competitive rate based on standard Internet variables including page views, unique visitors, session length, and advertising performance" (quote from examiner.com/assets/examinerfaq.html). Hmmm, I could write articles for examiner.com, then link to those articles from Wikipedia, and the links may give me a stream of income for the next few years – that's a powerful spam incentive and has to be considered as a factor when discussing whether to un-blacklist the site. Johnuniq (talk) 07:02, 11 December 2009 (UTC)[reply]
  • I would say that there is a strong prima facie argument for blacklisting sites where the author is paid based on site views (etc.). However, the final decision whether to blacklist or not comes down to multiple factors, including reliablility, history of use, and adequacy of alternative measures. You will note that affiliate links e.g. to Amazon.com are already blocked. Stifle (talk) 09:29, 11 December 2009 (UTC)[reply]
  • Any sort of "pay for play" site is susceptible to soft spamming, where many authors without any common interest all have an interest in pushing "just one or two links" to their revenue-generating content. I don't think we need to be preemptively chasing down such sites when they aren't linked on Wikipedia, but if there's evidence of use on Wikipedia, the bar should be pretty low and take the whole pattern into account. I'm the person who requested Examiner.com be blacklisted (which seems to be the underlying contentious issue), and part of my reasoning was that they misrepresent allow themselves to be confused with the San Francisco Examiner and Washington Examiner. That's a "reliable source" issue, but it's the combination of that and promotional linking that caused me to request blacklisting. Gavia immer (talk) 16:10, 11 December 2009 (UTC)[reply]
    • Examiner isn't really the primary contentious issue at this point, in our discussions leading up to this, there was a suggestion that Examiner actually does have some editorial standards similar to about.com, which is not blanket blacklisted. We didn't really continue to discuss examiner after that point, instead concentrating on the larger (and more clearly defined) issues of policy that came up. Examiner seems to be one of those edge cases. Gigs (talk) 16:18, 11 December 2009 (UTC)[reply]
  • Case by case. Rather-- article by article. It's when editors try to say certain sites as a whole are great places to start from and one day you can wake up with a 100k article filled with what looked like varied sources but all originated somewhere dubious. This reminds me of the progressive WP:PROF matter, which is just how much can we go for OR or works from the author as sources. Even if something screams at me in a duck test as something that's not being added with the best of intentions, guidelines are guidelines and if there are no reasons to object, I agree things can stay. Same here. Even if we know someone was paid a million dollars to write an article as sole motivation, if it works it works. I poke at these a bit further on POV and might adjust WP:WEIGHT a bit, but how it came to exist matters not if it matches everything it needs to. Is there some way to just flag resource locations as "dubious" or similar that would put it in a queue of likely moderate importance if found, kind of like how Huggle magically picks out levels of suspicion in edits far beyond a "which revert level?" feature. daTheisen(talk) 05:37, 12 December 2009 (UTC)[reply]
    XLinkBot can warn a user when they insert a link that is listed in its database. It reverts the user once, but will not revert them if they reinsert it. That's the closest we have to a "middle ground" at this point, and it's useless for controlling determined spammers, but useful for warning good-faith editors. Gigs (talk) 18:46, 12 December 2009 (UTC)[reply]
    can the bot keep a log of its activity? DGG ( talk ) 20:14, 17 December 2009 (UTC)[reply]
    It's at Special:Contributions/XLinkBot. Stifle (talk) 09:55, 23 December 2009 (UTC)[reply]
  • No, we should err on the side of openness. --Apoc2400 (talk) 17:58, 18 December 2009 (UTC)[reply]
  • No, we shouldn't. The blacklist already covers stuff it shouldn't, just because someone spammed a site years ago and you can't get stuff taken off. - Peregrine Fisher (talk) (contribs) 20:25, 27 December 2009 (UTC)[reply]
    May I ask which sites you think it shouldn't cover, and/or which sites you can't get off? --Beetstra (public) (Dirk BeetstraT C on public computers) 18:58, 30 December 2009 (UTC)[reply]
  • I think the current functionality of XLinkBot is sufficient provided people keep an eye on pages on which it has reverted. I favour as slim a blacklist as is possible. Brilliantine (talk) 01:19, 1 January 2010 (UTC)[reply]

Should the reliability of a source be a factor in blacklisting or whitelisting decisions?[edit]

Arguments for[edit]

The reliability of a source is not the primary motivation for blacklisting a site; the fact that it is being spammed is. That said, even a few links being spammed to an unreliable source can justify blanket blacklisting. The reliability of a source, as determined by the community, is considered for whitelisting decisions and decisions about the scope of a blacklist listing. If an unreliable source is blacklisted, the community doesn't lose as much as if a reliable source is blacklisted. The community often judges the reliability of these sources through discussions at WT:RS.

Arguments against[edit]

The nature of a wiki is that editors are encouraged and allowed to be bold. Decisions about the suitability of content are editorial, not administrative. Judgment about the reliability of a source should not be happening on whitelist request pages, it should be happening on article talk pages and through the normal process of bold, revert, discuss. Blacklisting is a last resort; it shouldn't be used as a pre-emptive measure. Blanket banning entire sites by using site-wide blacklist listings in response to limited spam is effectively pre-emption.

Discussion[edit]

I definitely do not think that sources should be blacklisted because there is potential for abuse... they should only be blacklisted because of actual abuse. So, no, I do not agree with the proposal. Blueboar (talk) 15:44, 10 December 2009 (UTC)[reply]
  • Sites are blacklisted because of abuse. Sites should not be blacklisted when they are failing WP:RS as the sole argument, they can be blacklisted when they get abused and also (generally) fail WP:RS, however it should be a factor in deciding requests for de-blacklisting and in whitelisting requests. --Dirk Beetstra T C 16:35, 10 December 2009 (UTC)(expanded --Dirk Beetstra T C 16:55, 10 December 2009 (UTC))[reply]
  • Ideally WP:RS would be irrelevant. The reality is if the level of spam is low-level and widespread (i.e. not just a few pages on a site) across an unreliable source, it's much less likely to get de-blacklisted or have its blacklist narrowed than if a reliable source is generating similar low-level and widespread spam, it's simply not worth the administrative effort. Assuming all other options have been exhausted, the reliability of a site can be used to determine how narrow the initial blacklist should be and how future adjustments to the list are made. Likewise, COI-issues such as blacklisting "pay for promotion" sites that happen to be reliable sources should also be a factor in setting the initial blacklist and in future adjustments. davidwr/(talk)/(contribs)/(e-mail) 18:05, 10 December 2009 (UTC)[reply]
    Re: reliable sources that get spammed are a pain, they don't get blacklisted unless it is absolutely over the limit, and then just temporarily to just get the message through. About.com is an example of the reasonably reliable pay for spam sites .. it is not blacklisted. --Dirk Beetstra T C 18:36, 10 December 2009 (UTC)[reply]
  • I think it's a bad idea to blacklist sources only based on a general lack of reliability. Some sources are mostly unreliable, but may be used in accordance with WP:RS in specific cases. I ran into one such borderline case, dealing with the same site and very similar to the whitelist request here: [1]. When I looked at the blacklist log I only found evidence of a single admin adding the site to the list, and no evidence of abuse. In my opinion XLinkBot should be the first option in cases like this, giving room for experienced editors to override it, with blacklisting only for the most egregious cases of actual abuse. Siawase (talk) 19:11, 10 December 2009 (UTC)[reply]
  • At most, reliability should be a minor factor. The level of spam problems should be the main factor and no site should ever be blacklisted just because it is unreliable. I believe every entry adds some overheard to every edit, so entries shouldn't be added without good reason. I don't have a problem w/considering reliability as a reason to de-list, though. --ThaddeusB (talk) 00:50, 11 December 2009 (UTC)[reply]
  • Lack of reliability should not be the basis for blacklisting, abuse should be the primary basis. Other control methods should be fully utilized before blacklisting. Would it be technically possible to create an article-by-article exception for a specific blacklisted URLs for sites that are highly reliable, but are being spammed to inappropriate articles? --Bejnar (talk) 03:28, 11 December 2009 (UTC)[reply]
    • I don't believe so, not with the black list. A bot could do it, however. I don't know if XLinkBot already contains this capability or not. I don't believe it does. Gigs (talk) 03:42, 11 December 2009 (UTC)[reply]
      • This has nothing to do with the technical capabilities of XLinkBot. No, page-specific blacklisting is not possible; and you run into the next problem: eliminating the spammy hyperlink on article X, but allowing the non-spammy one on the same article, or do specific blacklisting of specific hyperlinks to specific articles and wait for the spammer to change his document name and proceed (this is not assuming bad faith on spammers, this is representing what they do to get their links in)?? --Dirk Beetstra T C 06:51, 11 December 2009 (UTC)[reply]
  • Should the reliability of a source be a factor in blacklisting or whitelisting decisions? Yes, of course. We would never blacklist the New York Times, no matter how many times it is being spammed inappropriately. Likewise, if a particular site has no value to the encyclopedia (ie: it does not meet WP:RS and WP:EL) it should be easier to blacklist than sites with valuable information. ThemFromSpace 04:19, 11 December 2009 (UTC)[reply]
    • What about whitelisting decisions? Gigs (talk) 04:46, 11 December 2009 (UTC)[reply]
      • The criteria should be even stricter for whitelisting than blacklisting. Only if a site can be shown to meet WP:RS or WP:EL should it be whitelisted. Once a link has been abused to the point of blacklisting, we should be very reluctant to ever let it back in, unless a specific case can be showed that the link meets either WP:RS or WP:EL or some other exception. ThemFromSpace 05:06, 11 December 2009 (UTC)[reply]
  • It has been explained that a site is never blacklisted because it fails WP:RS. However, in some cases, judgement is required concerning whether a site should be blacklisted (is it being spammed significantly? is the problem ongoing? would blacklisting cause damage because some non-spammed pages are helpful as references?) and it is natural to consider WP:RS because there is no point worrying about blacklisting a site if the site is simply unsuitable for use on Wikipedia (and because pages from the site which are required, perhaps because they are an "official site", can be whitelisted). Johnuniq (talk) 07:04, 11 December 2009 (UTC)[reply]
  • I would say that unreliability is generally a contributory factor to a decision to blacklist, but should not be the only factor. The final decision whether to blacklist or not comes down to multiple factors, including whether there is a financial incentive to generate links to the site, history of use, and adequacy of alternative measures. Stifle (talk) 09:30, 11 December 2009 (UTC)[reply]
  • Reliability may be a consideration: persistent addition of an unreliable source is an abuse which is more likely to require control than persistent addition of a reliable source, and a reliable source may be worth additional measures to prevent the abuse before resorting to the blacklist. Abuse must still be a presumption for blacklisting to be appropriate. Guy (Help!) 15:12, 11 December 2009 (UTC)[reply]
  • While sites should not be blacklisted on the sole grounds of unreliability, apparent uselessness as a source weighs in favor of blacklisting a site being spammed, since the collateral damage associated with this action would be attenuated. Andrea105 (talk) 01:19, 13 December 2009 (UTC)[reply]
  • Blacklisting sites because of their lack of reliability as a source causes unnecessary collateral damage. I've had several instances of my edit being eaten because I tried to cite such a site in a discussion or add it as an external link when reliability wasn't in the least bit important. I also believe the blacklist should be cleaned up. Some sites can be put in the Abusefilter instead that allows people to merely get a warning, rather than have their contribution blocked immediately. - Mgm|(talk) 10:36, 14 December 2009 (UTC)[reply]
  • Sites are not blacklisted because of a lack of reliability, sites are blacklisted because they were abused, and that abuse is uncontrollable (see examples). Warning spammers does not stop them in the least (again, see examples). --Dirk Beetstra T C 11:08, 14 December 2009 (UTC)[reply]
In deciding if a site is being abused, legitimate use is or at least should be a factor, and reliability is relevant to that. So, indirectly, reliability does make a difference. DGG ( talk ) 20:16, 17 December 2009 (UTC)[reply]
  • I have, before I was aware of this discussion, asked for the blacklisting of two sites (old Wikipedia mirrors) because they are always unreliable, have been and are used multiple (dozens) of times per month, and are as such a nuisance and a net negative. However, these two sites are (as far as I am aware) not spammed, not abused: they are added by misguided, unknowing, careless editors, nothing more, nothing less. While the spam blacklist may not be the correct venue for stopping them, I am convinced that some method of disallowing these links should be implemented. If you want a much worse example: we have over 10,000 (!) links to fallingrain.com[2], despite this being an utterly unreliable source. This site is not being spammed, its use is just being copied over and over again. The only way to stop this is to put it on some blacklist. I don't care what list or method is used for this, but there are a number of sites or subsites which should never be used as a source, but are frequently linked now. Some mehtod of dealing with those is needed. Fram (talk) 12:47, 23 December 2009 (UTC)[reply]
  • To blacklist a source because of reliability concerns is simple laziness. Again, can XLinkBot not suffice for keeping an eye on potentially dubious sources provided people do their bit by keeping an eye out? Brilliantine (talk) 01:23, 1 January 2010 (UTC)[reply]
    Hmm .. laziness? Brilliantine, it pays to have your links here, it is not keeping out one or two editors a day. And it is not potentially dubious sources, it is a majority of useless sources, with some potentially good sources inbetween. No, XLinkBot does not suffice (though it should certainly be a first choice), reverting is too easy, and XLinkBot only works on IPs and not-established accounts. --Beetstra (public) (Dirk BeetstraT C on public computers) 14:32, 1 January 2010 (UTC)[reply]
  • It makes sense to me that if a site is repeatedly being used as a source, but it is for all purposes an unreliable one that it should be blacklisted. It doesn't make sense to indefinitely waste editors' time in removing it as a source and trying to figure out how to handle the info for which it was a source when it can be easily handled prospectively by preventing it from being added as a source. See e.g. Wikipedia:Reliable_sources/Noticeboard#Challenging_DEFSOUNDS.COM_as_a_biographical_source. Is there an easy way of telling what users have added links to a given site to see if they are all coming from a single or just a few accounts, or if it comes from a variety of users? If the former, spam is more of an issue, if the latter, more steps possibly need to be taken to educate people about RS. Шизомби (Sz) (talk) 21:38, 6 January 2010 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.