Jump to content

MediaWiki talk:Spam-blacklist

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by JzG (talk | contribs) at 20:59, 27 July 2019 (→‎AlexMacArthur spam: Added to Blacklist using SBHandler). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Mediawiki:Spam-blacklist is meant to be used by the spam blacklist extension. Unlike the meta spam blacklist, this blacklist affects pages on the English Wikipedia only. Any administrator may edit the spam blacklist. See Wikipedia:Spam blacklist for more information about the spam blacklist.


    Instructions for editors

    There are 4 sections for posting comments below. Please make comments in the appropriate section. These links take you to the appropriate section:

    1. Proposed additions
    2. Proposed removals
    3. Troubleshooting and problems
    4. Discussion

    Each section has a message box with instructions. In addition, please sign your posts with ~~~~ after your comment.

    Completed requests are archived. Additions and removals are logged, reasons for blacklisting can be found there.

    Addition of the templates {{Link summary}} (for domains), {{IP summary}} (for IP editors) and {{User summary}} (for users with account) results in the COIBot reports to be refreshed. See User:COIBot for more information on the reports.


    Instructions for admins
    Any admin unfamiliar with this page should probably read this first, thanks.
    If in doubt, please leave a request and a spam-knowledgeable admin will follow-up.

    Please consider using Special:BlockedExternalDomains instead, powered by the AbuseFilter extension. This is faster and more easily searchable, though only supports whole domains and not whitelisting.

    1. Does the site have any validity to the project?
    2. Have links been placed after warnings/blocks? Have other methods of control been exhausted? Would referring this to our anti-spam bot, XLinkBot be a more appropriate step? Is there a WikiProject Spam report? If so, a permanent link would be helpful.
    3. Please ensure all links have been removed from articles and discussion pages before blacklisting. (They do not have to be removed from user or user talk pages.)
    4. Make the entry at the bottom of the list (before the last line). Please do not do this unless you are familiar with regular expressions — the disruption that can be caused is substantial.
    5. Close the request entry on here using either {{done}} or {{not done}} as appropriate. The request should be left open for a week maybe as there will often be further related sites or an appeal in that time.
    6. Log the entry. Warning: if you do not log any entry you make on the blacklist, it may well be removed if someone appeals and no valid reasons can be found. To log the entry, you will need this number – 908157040 after you have closed the request. See here for more info on logging.


    Proposed additions

    drive.google.com

    Cannot think of a responsible use of this other than for the Google Drive article. I see this being used to using original research or otherwise unreliable sources, or worse for malware/spam distribution.

    Unfortunately, a number of articles are using Google Drive links as references or otherwise. I picked a random article to see how what kind of content was being used - Hyperinflation in Brazil seems to link to original research in the Google Drive link used there.

    I would like to see additional input - I think it isn't a problem to use these in project or userspace, but I would say that 90% of mainspace usage would be problematic. Does the community have any other thoughts? Jon Kolbert (talk) 19:46, 5 July 2019 (UTC)[reply]

    Upon further reflection this would probably be best as an edit filter to limit the blacklist to mainspace and allow extended-confirmed users to use it elsewhere since spam-blacklist is for every namespace. Jon Kolbert (talk) 20:12, 5 July 2019 (UTC)[reply]
    Some of these seem to be historic documents - could they, should they, be transferred to archive.org ? Case in point, the NSA interview transcripts from "Rasterschlüssel 44"? DS (talk) 21:09, 5 July 2019 (UTC)[reply]
    I feel like that would be a more stable, reliable solution than a link to a Google Drive folder whose owner can change the contents at any point in time. Jon Kolbert (talk) 21:48, 5 July 2019 (UTC)[reply]
    'Some of these seem to be historic documents' .. is this google drive the only place where they are available. I would argue that although it is certainly valuable to have a link to an online copy, it is not absolutely needed (as long as you uniquely describe the document). And if they are out of copyright (so really historic) that they could easily be incorporated into WikiSource. --Dirk Beetstra T C 13:32, 7 July 2019 (UTC)[reply]
    Comment I have alerted WP:RSN to this discussion as the above comment relates to reliability. I myself support this proposal for the following reasons:
    1. It probably fails WP:ELNO#11.
    2. Some pages also fail WP:ELNO#3.
    3. It unambiguously fails the reliable sources criteria as user-generated content, and better sources are almost always available. –LaundryPizza03 (d) 22:47, 15 July 2019 (UTC)[reply]
    The assumption that there is a specific kind of source involved is misguided. A public, general-purpose file storage service is ambiguous. Like YouTube, it can be used for reliable sources (primary or secondary) and appropriate external links, or inappropriate ones. This is why WP:YOUTUBE gives caution but says that "Links should be evaluated for inclusion with due care on a case-by-case basis." Just the other day I cited a magazine which distributes its back issues through Google Drive. Is there widespread abuse, compared to similar sites, that would justify the drastic step of blacklisting it? Kim Post (talk) 00:29, 16 July 2019 (UTC)[reply]
    @Kim Post: gauging abuse here is a difficult one. If 10% turns out to be (likely) copyright violations then yes, there is abuse. Abuse in the term of spamming, I don't think so (but then we would not discuss this if that was the case). I agree that the case seems similar to Youtube, but I don't know about the ratios - how many are copyright violations, how many are convenience, how many are not replaceable, etc. (noting that of the material on Youtube that is useful to Wikipedia the percentage of (likely) copyright violations is higher than the overall percentage on Youtube). --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]
    Special:Search/insource:"drive.google.com" shows 2,550 articles currently citing Google Drive. If only 90% of mainspace usage is problematic, it means 255 articles are using Google Drive as a legitimate source, which is too high for blacklisting. If the content of the sources is appropriate, though in the best format, an edit filter showing a warning message, or having a bot to undo additions by new users, is a better approach than blacklisting the link and requiring all uses to be whitelisted. feminist (talk) 01:58, 16 July 2019 (UTC)[reply]
    @Feminist: 'If only 90% of mainspace usage is problematic' .. only? If that 90% of the cases has roughly 20% (likely) copyright violations (the first link I clicked on was link to a personal copy of an article copyrighted by Elsevier where I would consider that this is likely/maybe out of scope of what Elsevier allows, and, obviously, there is a proper link to the proper, albeit paywalled, article) then we are talking hundreds of copyright violations. That is way too high to allow unlimited inclusion (and hence, blacklist might be appropriate). (in short: you would need a full analysis of all, not just eyeballing 10% is fine, for all you know, it is only 1% that is fine, which is something that the whitelist can easily handle). I could however agree with adding this to XLinkBot or an edit filter to step this up and reconsider blacklisting after a couple of months. --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]

    Very much in tow minds, yes it is no different from any other storage medium, but (as others have pointed out) it might also (as a storage medium have stuff that would pass RS. At this time I lean to no.Slatersteven (talk) 09:24, 16 July 2019 (UTC)[reply]

    @GRuban: by the Devil's advocate: so it is just as likely to contain bad content as the website of the BBC, youtube, Elsevier, or blogger? --Dirk Beetstra T C 20:30, 16 July 2019 (UTC)[reply]
    Respectively, no, yes, no, and yes. The point is that the BBC and Elsevier exercise editorial control. Blogger and YouTube and Google Drive do not. So, yes, most stuff on YouTube and Blogger and Google Drive don't meet our criteria as reliable sources; but some does, so we shouldn't throw the baby out with the bathwater expert self-published opinions out with the overwhelming majority of self published opinion. --GRuban (talk) 21:38, 16 July 2019 (UTC)[reply]
    The BBC are not a storage medium, they are a creator.Slatersteven (talk) 08:54, 17 July 2019 (UTC)[reply]
    The BBC is a creator who stores their info on their own site, many people who are a creator and do not own an own site store it somewhere else, like on youtube, blogspot or on drive.google.com.
    Exactly, Gruban. Blogger, YouTube and Google Drive do not have editorial control, and are generally unreliable. With the first 2 of those we exhibit quite strong editorial control. They are on XLinkBot, and we generally do not hesitate when abuse is so bad that material needs blacklisting (there are several blogger sites on the blacklist, and specific Youtube videos/channels. Other of those 'free storage sites' we have blacklisted, like Hulu, examiner, based on a similar discussion as this one. The question is whether the good material (material that is really needed) outweighs the bad material (rubbish, copyright violations, 'spam', etc.). The point 'no more or less likely to contain good or bad content than any other arbitrary website', this one falls well in the range of blogger, Youtube, Hulu and examiner, and I wonder whether it is just as likely to have bad material as YouTube, or just as likely to have bad material like Examiner (to pick 2). --Dirk Beetstra T C 15:46, 17 July 2019 (UTC)[reply]
    • Support At Google drive they are user-generated content. Yes, some reliable source are offline source and/or behind paywall, but it is not the reason to re-publish them under google drive as pirate copy. Also, wikipedia should not use url that point to those pirate resource. Are there any genuine source that were hosted as Google Drive? Please point it out among 8,932 entries of Google Drive currently in wikipedia as a black swan. Matthew hk (talk) 11:57, 17 July 2019 (UTC)[reply]
    • If you really want to collect anecdotes: out-of-print issues of C3i Magazine, a publication about wargames, are hosted on Google Drive. The official website provides these links. More to the point, the bare fact that a website is open to the public, and so could be used for bad sources or external links, is not a reason to put it on the spam blacklist. Kim Post (talk) 03:14, 18 July 2019 (UTC)[reply]
    • @Kim Post: It is not a question of could. The link used on Zohar e.g. leads to a pdf which looks like the printout of another website. That is very likely a copyright violation, also because I can find the text elsewhere. I saw another example like that earlier but it seems to have been removed. It remains a question of balance between use and abuse, and how much abuse you want to take. And in the area of spam, anything that could be abused eventually will be. --Dirk Beetstra T C 03:45, 18 July 2019 (UTC) (found the links to publications, see below --Dirk Beetstra T C 05:00, 18 July 2019 (UTC))[reply]

    It gets more interesting,

    in e.g. diff, diff, diff &c. links consistently to work by the same authors and in all cases links to a google-drive copy of the work of the authors (not, as is more normal, using doi or a link to the publishing papers). Now looking at the IP (which is in New York University, New York) that does overlap with the stated location of one of the three editors. Looking at the personal copy document, it state:

    • This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues.
      Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited.
      In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository.

    I agree that the copyright status there is a grey area, but this is likely a case of someone promoting their work (i.e. spamming) using this website as the medium for the spam. --Dirk Beetstra T C 05:00, 18 July 2019 (UTC)[reply]

    businesstelegraph.co.uk

    This domain publishes scraped articles from other publications. Editors are accidentally citing this domain instead of the original source. Examples:

    1. https://www.businesstelegraph.co.uk/airbnb-rentals-in-london-block-sparks-call-for-action (from Special:Diff/888209120) is a copyright violation of a Financial Times article
    2. https://www.businesstelegraph.co.uk/andy-palmer-revs-up-iconic-british-sports-car-firm-aston-martin-for-a-5bn-float (from Special:Diff/872611028) is a copyright violation of a This is Money article
    3. https://www.businesstelegraph.co.uk/icon-of-icons-autocar-awards-readers-champion-bmw-3-series (from Special:Diff/886530753) is a copyright violation of an Autocar article

    I have already removed or replaced all links to businesstelegraph.co.uk in articles. — Newslinger talk 01:37, 12 July 2019 (UTC)[reply]

    The noticeboard discussion has been archived to Wikipedia:Reliable sources/Noticeboard/Archive 269 § businesstelegraph.co.uk. — Newslinger talk 03:22, 18 July 2019 (UTC)[reply]

    jsbmarketresearch.com

    Spam for a marketing website, 30+ additions by various SPAs (and a few erroneous good-faith edits). Several warnings have been ignored. As a promotional website with questionable credentials, the source is not reliable and has no foreseeable encyclopedic usages (and even then, whitelisting would be an option). GermanJoe (talk) 10:00, 12 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 06:01, 13 July 2019 (UTC)[reply]

    futuresharks.com

    Futuresharks is a website that will manufacture an article about you for $500. (If such websites are not added to the blacklist, please remove my listing.) I've seen it used on three new articles in the last day: G50X, Top Tier, Shaun Lee, which cite fake articles such as [1], and I've removed the source from 4 other older articles. It has been used by "entrepreneurs" here since at least 2017 --89.153.64.16 (talk) 23:26, 12 July 2019 (UTC)[reply]

    plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 06:05, 13 July 2019 (UTC)[reply]

    usaherald.com

    USAHerald is a website that purports be a news site, but is a blog used to get links for negative reputation management and blackhat SEO. Claims go without citation, and fake articles targeting people the site owner/administrator simply does not like are common with the goal of ranking them in search engines, see [2]. USAHerald backlink analyses show numerous comment spam campaigns, see Comment 207 here [3]. USAHerald.com uses Wikipedia links and citations to improve its domain authority. Sponsored content is also linked on Wikipedia, see [4] 207.144.76.106 (talk) 17:24, 15 July 2019 (UTC)[reply]

    I don't see any obvious spam campaigns from the link summary. Whether or not this is a reliable source is certainly questionable. This probably belongs on the reliable sources noticeboard instead. OhNoitsJamie Talk 17:52, 15 July 2019 (UTC)[reply]
    Not sure I understand. A site that is basically exclusively used to trash neighbors but use wikipedia as a link authority isn’t spam? I will add it to the reliable sources notice board, but this seems very obviously problematic from its HTTPS link reference. Just look at “healthcare in Costa Rica” 75.138.97.214 (talk) 06:04, 16 July 2019 (UTC)[reply]

    p2pmarketdata.com

    Spamming for a new P2P finance blog (since January 2019) by various SPAs/IPs. The blog includes a referral scheme for cashback boni. Four previous warnings have been ignored. GermanJoe (talk) 05:02, 16 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:57, 27 July 2019 (UTC)[reply]

    AlexMacArthur spam

    Link spam. Anarchyte (talk | work) 15:32, 17 July 2019 (UTC)[reply]

    @Anarchyte: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:59, 27 July 2019 (UTC)[reply]

    wikiwaparz.com

    Recurring spam for a Nigerian download site. Deceptive overwriting of existing valid source links (for example: IP 105.112.39.67). Several warnings have been ignored. GermanJoe (talk) 07:57, 18 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:57, 27 July 2019 (UTC)[reply]

    blockchain.news

    Low-quality cryptocurrency blog. JohnnyBCN started adding reference links, then tried to remove themselves from the WP:GS/Crypto notifications list. Wmbc918 started up immediately after. All instances reverted, but there's no circumstance in which this will ever be a useful reference - David Gerard (talk) 10:36, 18 July 2019 (UTC)[reply]

    @David Gerard: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:51, 27 July 2019 (UTC)[reply]

    dnbamerica.com

    Mirror domain of dnbnumber.com (already blacklisted), spammed in Data Universal Numbering System. Likely fraud (DUNS has an authorized page to apply for such numbers on their own official website), certainly spam - please blacklist. GermanJoe (talk) 06:22, 25 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:50, 27 July 2019 (UTC)[reply]

    Proposed removals


    Vamanan's wordpress

    This is the official wordpress site of music historian Vamanan. Kailash29792 (talk) 11:39, 17 July 2019 (UTC)[reply]

    indianholiday.com

    I found that Wikipedia has been blacklisted indianholiday.com. When I checked the blacklisted list, I found that there was no reason for blacklisting by your administrator or contributor. Indian Holiday is a very reputed website in India for tour and travel service since 1990. This Company has won two-time national tourism awards. The Indian holiday Pvt. ltd company is approved by tourism of India. This site contains unique and useful information, tours to Foreigner Tourist who come to India to explore Indian culture, history, monuments, and cuisine.

    Perhaps someone did a spammy thing. That's why a Wiki administrator blocks the website. Also, I found that Indian Holiday is blacklisted since 2018 march. So, I think indainholiday.com should be delisted. Because this site provides provide a useful query to foreigner tourist. Please look in the matter and help me out with this problem. Adityainfoboy (talk) 09:08, 25 July 2019 (UTC)[reply]

    See Wikipedia_talk:WikiProject_Spam/2008_Archive_Mar_3#Assorted_Indian_spamming and MediaWiki_talk:Spam-blacklist/archives/January_2012#indianholiday.com. ¶ How do links to package tour companies benefit encyclopedia articles? -- Hoary (talk) 10:02, 25 July 2019 (UTC)[reply]
     Not done "Wikipedia does not benefit from links to package tour companies" is the correct answer. Someone did indeed do a "spammy thing." OhNoitsJamie Talk 14:17, 25 July 2019 (UTC)[reply]

    I got your point. But due to the blacklist. I am unable to make a company profile on Wikipedia like (yatra.com and makemytrip.com). Please provide a way so that I could add company information on Wikipedia. And I believe company information on Wikipedia could help people to find out accurate information and service about the company. Adityainfoboy (talk) 06:21, 26 July 2019 (UTC)[reply]

    No. The other companies you mention are large, NASDAQ traded companies, and therefore are likely to satisfy WP:CORP notability, unlike indianholiday, which you are obviously affiliated with. OhNoitsJamie Talk 14:01, 26 July 2019 (UTC)[reply]
    Some small companies merit articles. Are there independent sources for an article on this company? If so, then somebody can use them to make an article on the company. (And if not, then nobody can make an article on the company.) -- Hoary (talk) 14:07, 26 July 2019 (UTC)[reply]

    Troubleshooting and problems

    Logging / COIBot Instructions

    Blacklist logging

    Full instructions for admins


    Quick reference

    For Spam reports or requests originating from this page, use template {{/request|0#section_name}}

    • {{/request|213416274#Section_name}}
    • Insert the oldid 213416274 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.

    For Spam reports or requests originating from Wikipedia_talk:WikiProject_Spam use template {{WPSPAM|0#section_name}}

    • {{WPSPAM|182725895#Section_name}}
    • Insert the oldid 182725895 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.
    Note: If you do not log your entries, it may be removed if someone appeals the entry and no valid reasons can be found.

    Addition to the COIBot reports

    The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

    1. first number, how many links did this user add (is the same after each link)
    2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
    3. third number, how many times did this user add this link
    4. fourth number, to how many different wikipedia did this user add this link.

    If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user who adds a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. This data is available in real-time on IRC.

    Poking COIBot

    When adding {{LinkSummary}}, {{UserSummary}} and/or {{IPSummary}} templates to WT:WPSPAM, WT:SBL, WT:SWL and User:COIBot/Poke (the latter for privileged editors) COIBot will generate linkreports for the domains, and userreports for users and IPs.


    Discussion