Jump to content

MediaWiki talk:Spam-blacklist

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by JJMC89 (talk | contribs) at 21:37, 18 August 2019 (→‎Kyrgyz medical school phishing sites: Added to Blacklist using SBHandler). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Mediawiki:Spam-blacklist is meant to be used by the spam blacklist extension. Unlike the meta spam blacklist, this blacklist affects pages on the English Wikipedia only. Any administrator may edit the spam blacklist. See Wikipedia:Spam blacklist for more information about the spam blacklist.


    Instructions for editors

    There are 4 sections for posting comments below. Please make comments in the appropriate section. These links take you to the appropriate section:

    1. Proposed additions
    2. Proposed removals
    3. Troubleshooting and problems
    4. Discussion

    Each section has a message box with instructions. In addition, please sign your posts with ~~~~ after your comment.

    Completed requests are archived. Additions and removals are logged, reasons for blacklisting can be found there.

    Addition of the templates {{Link summary}} (for domains), {{IP summary}} (for IP editors) and {{User summary}} (for users with account) results in the COIBot reports to be refreshed. See User:COIBot for more information on the reports.


    Instructions for admins
    Any admin unfamiliar with this page should probably read this first, thanks.
    If in doubt, please leave a request and a spam-knowledgeable admin will follow-up.

    Please consider using Special:BlockedExternalDomains instead, powered by the AbuseFilter extension. This is faster and more easily searchable, though only supports whole domains and not whitelisting.

    1. Does the site have any validity to the project?
    2. Have links been placed after warnings/blocks? Have other methods of control been exhausted? Would referring this to our anti-spam bot, XLinkBot be a more appropriate step? Is there a WikiProject Spam report? If so, a permanent link would be helpful.
    3. Please ensure all links have been removed from articles and discussion pages before blacklisting. (They do not have to be removed from user or user talk pages.)
    4. Make the entry at the bottom of the list (before the last line). Please do not do this unless you are familiar with regular expressions — the disruption that can be caused is substantial.
    5. Close the request entry on here using either {{done}} or {{not done}} as appropriate. The request should be left open for a week maybe as there will often be further related sites or an appeal in that time.
    6. Log the entry. Warning: if you do not log any entry you make on the blacklist, it may well be removed if someone appeals and no valid reasons can be found. To log the entry, you will need this number – 911442561 after you have closed the request. See here for more info on logging.


    Proposed additions

    drive.google.com

    Cannot think of a responsible use of this other than for the Google Drive article. I see this being used to using original research or otherwise unreliable sources, or worse for malware/spam distribution.

    Unfortunately, a number of articles are using Google Drive links as references or otherwise. I picked a random article to see how what kind of content was being used - Hyperinflation in Brazil seems to link to original research in the Google Drive link used there.

    I would like to see additional input - I think it isn't a problem to use these in project or userspace, but I would say that 90% of mainspace usage would be problematic. Does the community have any other thoughts? Jon Kolbert (talk) 19:46, 5 July 2019 (UTC)[reply]

    Upon further reflection this would probably be best as an edit filter to limit the blacklist to mainspace and allow extended-confirmed users to use it elsewhere since spam-blacklist is for every namespace. Jon Kolbert (talk) 20:12, 5 July 2019 (UTC)[reply]
    Some of these seem to be historic documents - could they, should they, be transferred to archive.org ? Case in point, the NSA interview transcripts from "Rasterschlüssel 44"? DS (talk) 21:09, 5 July 2019 (UTC)[reply]
    I feel like that would be a more stable, reliable solution than a link to a Google Drive folder whose owner can change the contents at any point in time. Jon Kolbert (talk) 21:48, 5 July 2019 (UTC)[reply]
    'Some of these seem to be historic documents' .. is this google drive the only place where they are available. I would argue that although it is certainly valuable to have a link to an online copy, it is not absolutely needed (as long as you uniquely describe the document). And if they are out of copyright (so really historic) that they could easily be incorporated into WikiSource. --Dirk Beetstra T C 13:32, 7 July 2019 (UTC)[reply]
    Comment I have alerted WP:RSN to this discussion as the above comment relates to reliability. I myself support this proposal for the following reasons:
    1. It probably fails WP:ELNO#11.
    2. Some pages also fail WP:ELNO#3.
    3. It unambiguously fails the reliable sources criteria as user-generated content, and better sources are almost always available. –LaundryPizza03 (d) 22:47, 15 July 2019 (UTC)[reply]
    The assumption that there is a specific kind of source involved is misguided. A public, general-purpose file storage service is ambiguous. Like YouTube, it can be used for reliable sources (primary or secondary) and appropriate external links, or inappropriate ones. This is why WP:YOUTUBE gives caution but says that "Links should be evaluated for inclusion with due care on a case-by-case basis." Just the other day I cited a magazine which distributes its back issues through Google Drive. Is there widespread abuse, compared to similar sites, that would justify the drastic step of blacklisting it? Kim Post (talk) 00:29, 16 July 2019 (UTC)[reply]
    @Kim Post: gauging abuse here is a difficult one. If 10% turns out to be (likely) copyright violations then yes, there is abuse. Abuse in the term of spamming, I don't think so (but then we would not discuss this if that was the case). I agree that the case seems similar to Youtube, but I don't know about the ratios - how many are copyright violations, how many are convenience, how many are not replaceable, etc. (noting that of the material on Youtube that is useful to Wikipedia the percentage of (likely) copyright violations is higher than the overall percentage on Youtube). --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]
    Special:Search/insource:"drive.google.com" shows 2,550 articles currently citing Google Drive. If only 90% of mainspace usage is problematic, it means 255 articles are using Google Drive as a legitimate source, which is too high for blacklisting. If the content of the sources is appropriate, though in the best format, an edit filter showing a warning message, or having a bot to undo additions by new users, is a better approach than blacklisting the link and requiring all uses to be whitelisted. feminist (talk) 01:58, 16 July 2019 (UTC)[reply]
    @Feminist: 'If only 90% of mainspace usage is problematic' .. only? If that 90% of the cases has roughly 20% (likely) copyright violations (the first link I clicked on was link to a personal copy of an article copyrighted by Elsevier where I would consider that this is likely/maybe out of scope of what Elsevier allows, and, obviously, there is a proper link to the proper, albeit paywalled, article) then we are talking hundreds of copyright violations. That is way too high to allow unlimited inclusion (and hence, blacklist might be appropriate). (in short: you would need a full analysis of all, not just eyeballing 10% is fine, for all you know, it is only 1% that is fine, which is something that the whitelist can easily handle). I could however agree with adding this to XLinkBot or an edit filter to step this up and reconsider blacklisting after a couple of months. --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]

    Very much in tow minds, yes it is no different from any other storage medium, but (as others have pointed out) it might also (as a storage medium have stuff that would pass RS. At this time I lean to no.Slatersteven (talk) 09:24, 16 July 2019 (UTC)[reply]

    @GRuban: by the Devil's advocate: so it is just as likely to contain bad content as the website of the BBC, youtube, Elsevier, or blogger? --Dirk Beetstra T C 20:30, 16 July 2019 (UTC)[reply]
    Respectively, no, yes, no, and yes. The point is that the BBC and Elsevier exercise editorial control. Blogger and YouTube and Google Drive do not. So, yes, most stuff on YouTube and Blogger and Google Drive don't meet our criteria as reliable sources; but some does, so we shouldn't throw the baby out with the bathwater expert self-published opinions out with the overwhelming majority of self published opinion. --GRuban (talk) 21:38, 16 July 2019 (UTC)[reply]
    The BBC are not a storage medium, they are a creator.Slatersteven (talk) 08:54, 17 July 2019 (UTC)[reply]
    The BBC is a creator who stores their info on their own site, many people who are a creator and do not own an own site store it somewhere else, like on youtube, blogspot or on drive.google.com.
    Exactly, Gruban. Blogger, YouTube and Google Drive do not have editorial control, and are generally unreliable. With the first 2 of those we exhibit quite strong editorial control. They are on XLinkBot, and we generally do not hesitate when abuse is so bad that material needs blacklisting (there are several blogger sites on the blacklist, and specific Youtube videos/channels. Other of those 'free storage sites' we have blacklisted, like Hulu, examiner, based on a similar discussion as this one. The question is whether the good material (material that is really needed) outweighs the bad material (rubbish, copyright violations, 'spam', etc.). The point 'no more or less likely to contain good or bad content than any other arbitrary website', this one falls well in the range of blogger, Youtube, Hulu and examiner, and I wonder whether it is just as likely to have bad material as YouTube, or just as likely to have bad material like Examiner (to pick 2). --Dirk Beetstra T C 15:46, 17 July 2019 (UTC)[reply]
    • Support At Google drive they are user-generated content. Yes, some reliable source are offline source and/or behind paywall, but it is not the reason to re-publish them under google drive as pirate copy. Also, wikipedia should not use url that point to those pirate resource. Are there any genuine source that were hosted as Google Drive? Please point it out among 8,932 entries of Google Drive currently in wikipedia as a black swan. Matthew hk (talk) 11:57, 17 July 2019 (UTC)[reply]
    • If you really want to collect anecdotes: out-of-print issues of C3i Magazine, a publication about wargames, are hosted on Google Drive. The official website provides these links. More to the point, the bare fact that a website is open to the public, and so could be used for bad sources or external links, is not a reason to put it on the spam blacklist. Kim Post (talk) 03:14, 18 July 2019 (UTC)[reply]
    • @Kim Post: It is not a question of could. The link used on Zohar e.g. leads to a pdf which looks like the printout of another website. That is very likely a copyright violation, also because I can find the text elsewhere. I saw another example like that earlier but it seems to have been removed. It remains a question of balance between use and abuse, and how much abuse you want to take. And in the area of spam, anything that could be abused eventually will be. --Dirk Beetstra T C 03:45, 18 July 2019 (UTC) (found the links to publications, see below --Dirk Beetstra T C 05:00, 18 July 2019 (UTC))[reply]

    It gets more interesting,

    in e.g. diff, diff, diff &c. links consistently to work by the same authors and in all cases links to a google-drive copy of the work of the authors (not, as is more normal, using doi or a link to the publishing papers). Now looking at the IP (which is in New York University, New York) that does overlap with the stated location of one of the three editors. Looking at the personal copy document, it state:

    • This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues.
      Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited.
      In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository.

    I agree that the copyright status there is a grey area, but this is likely a case of someone promoting their work (i.e. spamming) using this website as the medium for the spam. --Dirk Beetstra T C 05:00, 18 July 2019 (UTC)[reply]

    • At the very least this should be on an edit filter, but actually I think it should be blacklisted. Think about it: a Google Drive document can be updated by the owner at any time, is typically not archived by the archiving services, may be in violation of copyright, and we have no proof that it's an authentic copy of the material even if it's not. Oh, and several document types allow for infection with malware, which typically gets screened out by reputable online sources. Any original papers should be identified by DOI reference not by links to personal copies on file sharing platforms, and academics almost always have the ability to upload to space within their institution's own website. Linking to Google Drive, OneDrive and the like seems like an open invitation for abuse. Guy (Help!) 09:56, 2 August 2019 (UTC)[reply]

    onezorse.com

    Spam additions including reference substitutions. Guy (Help!) 09:50, 2 August 2019 (UTC)[reply]

    @JzG: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 21:55, 4 August 2019 (UTC)[reply]

    youngstownwater.com

    Spammy links added to external links sections and as references - Eureka Lott 20:56, 4 August 2019 (UTC)[reply]

    @EurekaLott: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 21:54, 4 August 2019 (UTC)[reply]

    Spam for advisory websites

    Recurring spam for two - possibly related - advisory and registration websites (see COIBot reports for more details). Several warnings have been ignored. GermanJoe (talk) 09:50, 5 August 2019 (UTC)[reply]

    The IPs of the servers that the websites seem to run on are very similar. --Dirk Beetstra T C 10:26, 5 August 2019 (UTC)[reply]
    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 23:52, 11 August 2019 (UTC)[reply]

    selfgrowth.com

    selfgrowth.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    Appears to have extensively been spammed by many users (too many to list here, but COIBot lists them, of course). No extent use in mainspace. However, the COIBot report stops at 2015, but shows a number of possibly empty sections (a possible bug?) Thanks, —PaleoNeonate22:48, 5 August 2019 (UTC)[reply]

    @PaleoNeonate: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 23:48, 11 August 2019 (UTC)[reply]

    BLP spam - self-published and dangerous source

    Spam on a BLP with some determination, also turns up in other BLPs. Guy (Help!) 15:35, 7 August 2019 (UTC)[reply]

    @JzG/help: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 15:40, 7 August 2019 (UTC)[reply]

    walkultimate.com

    Seems relatively recent, but these contemptible people are manipulating existing references for spammish purposes. See here, here, here and here for some examples. Thanks, Cyphoidbomb (talk) 23:32, 11 August 2019 (UTC)[reply]

    @Cyphoidbomb: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 23:36, 11 August 2019 (UTC)[reply]
    @JJMC89: Thanks! Cyphoidbomb (talk) 23:37, 11 August 2019 (UTC)[reply]

    biglybt.com

    Long running campaign mostly by IPs or SPA's to get this program in the encyclopedia despite multiple removals as separate article, external link or section. See for the section see the examples here, [1]. For a link see the example here. The developer claims that the demand for inclusion in "Comparison of BitTorrent clients" is unfair: Talk:Comparison of BitTorrent clients. The Banner talk 00:00, 12 August 2019 (UTC)[reply]

    @The Banner: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 00:24, 12 August 2019 (UTC)[reply]

    Kyrgyz medical school phishing sites

    Th official websites (.edu.kg) of Medical Institute, Osh State University and International School of Medicine Kyrgyzstan are being repeatedly changed to these almost identical phishing websites (containing the same @gmail.com contact information). – Thjarkur (talk) 10:23, 18 August 2019 (UTC)[reply]

    @Þjarkur: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 21:37, 18 August 2019 (UTC)[reply]

    Proposed removals


    econlib.org

    econlib.org: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    This link appears to have been grouped in with spammy immigration law additions? [[2]] in a March 2017 addition to the Local blacklist: [[3]]

    Maybe the spamvertizers used this link also in some of their posts...but this link is to the real website for this organization: The Library of Economics and Liberty It may be a biased reference, but it is currently cited in one article: Wage share.

    I'm guessing this was a false positive? ---Avatar317(talk) 23:11, 5 August 2019 (UTC)[reply]

    @Avatar317: no, the spamvertizers (declared paid editing ring) also had a close connection to the subject econlib and edited and created pages related to econlib. Maybe not part of the paid editing, but definite conflict of interest. FYI, except for the encyclopedia (which is whitelisted) all information is full text available from other libraries, and in many cases even from WikiSource. --Dirk Beetstra T C 04:09, 6 August 2019 (UTC)[reply]
    • Avatar317, I fixed this. It was an absolutely standard example of the genre: a public domain work, linked to the right-wing think tank, and listed as being published by them. That last bit of deceptive attribution is very common, I have found and removed hundreds, often to very well known works like Gibbon's Decline And fall Of The Roman Empire. In this case the full text is available at Gutenberg, and the publisher is not the Orwellian-titled "Library of Economics and Liberty" but John Murray, of London. I don't think there were many right-wing think tanks operating in 1817, and this one certainly wasn't. Guy (Help!) 07:18, 6 August 2019 (UTC)[reply]
    • no Declined for the record. — JJMC89(T·C) 23:43, 11 August 2019 (UTC)[reply]

    business-sale.com

    business-sale.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com Is this what is referred to as a 'regex' issue?

    When including a link to this domain, contributors get the message: Your edit was not saved because it contains a new external link to a site registered on Wikipedia's blacklist. ...... --->

    "The following link has triggered a protection filter: -sale.com"

    It appears that because the domain has '-sale.com' contained therein, it is by default being classed as a spam domain. In fact it is an industry-recognized Google News-listed authority site at least 15 years old with journalists employed to research news about companies falling into insolvency or larger companies that have been put up for sale or have been divested. -- Montymoore (talk) 01:05, 11 August 2019 (UTC)[reply]

    no Declined. If WP:RSN establishes that it is a reliable source, then  Defer to Whitelist. To delist,  Defer to Global blacklist. — JJMC89(T·C) 23:42, 11 August 2019 (UTC)[reply]

    CGAP.org

    Is it possible to remove CGAP.org from the blacklist? It is a reliable source on financial inclusion. — Preceding unsigned comment added by Noel92140 (talkcontribs) 14:45, 15 August 2019 (UTC)[reply]

    cgap.org is a reliable source on financial inclusion. It is a trust fund housed by the World Bank and it provides relevant publications on financial inclusion Adding this link would be beneficial for all pages providing information on financial inclusion, digital credit, policy, customer protection. — Preceding unsigned comment added by Noel92140 (talkcontribs) 14:54, 15 August 2019 (UTC)[reply]

    Here's the previous discussion from 2012. – Thjarkur (talk) 17:52, 15 August 2019 (UTC)[reply]
    Have you discussed this on WP:RSN? Guy (Help!) 17:28, 17 August 2019 (UTC)[reply]

    census2011.co.in

    census2011.co.in: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    This is the official website containing the 2011 census data of India. Why is it blocked? It is used as a source on countless articles. SD0001 (talk) 10:04, 17 August 2019 (UTC)[reply]

    Logging / COIBot Instructions

    Blacklist logging

    Full instructions for admins


    Quick reference

    For Spam reports or requests originating from this page, use template {{/request|0#section_name}}

    • {{/request|213416274#Section_name}}
    • Insert the oldid 213416274 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.

    For Spam reports or requests originating from Wikipedia_talk:WikiProject_Spam use template {{WPSPAM|0#section_name}}

    • {{WPSPAM|182725895#Section_name}}
    • Insert the oldid 182725895 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.
    Note: If you do not log your entries, it may be removed if someone appeals the entry and no valid reasons can be found.

    Addition to the COIBot reports

    The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

    1. first number, how many links did this user add (is the same after each link)
    2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
    3. third number, how many times did this user add this link
    4. fourth number, to how many different wikipedia did this user add this link.

    If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user who adds a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. This data is available in real-time on IRC.

    Poking COIBot

    When adding {{LinkSummary}}, {{UserSummary}} and/or {{IPSummary}} templates to WT:WPSPAM, WT:SBL, WT:SWL and User:COIBot/Poke (the latter for privileged editors) COIBot will generate linkreports for the domains, and userreports for users and IPs.


    Discussion

    Have we arrived at a point where we can conduct a montly or quarterly review of blacklist hits? The list is huge. I would suggest anything with no hits in 12 months could be removed. But can we get the stats? Guy (Help!) 23:55, 15 August 2019 (UTC)[reply]