Jump to content

MediaWiki talk:Spam-blacklist: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
+1
Line 140: Line 140:


Appears to have extensively been spammed by many users (too many to list here, but COIBot lists them, of course). No extent use in mainspace. However, the COIBot report stops at 2015, but shows a number of possibly empty sections (a possible bug?) Thanks, —[[User:PaleoNeonate|<span style="font-variant:small-caps;color:#44a;text-shadow:2px 2px 3px DimGray;">Paleo</span>]][[User talk:PaleoNeonate|<span style="font-variant:small-caps;color:#272;text-shadow:2px 2px 3px DimGray;">Neonate</span>]] – 22:48, 5 August 2019 (UTC)
Appears to have extensively been spammed by many users (too many to list here, but COIBot lists them, of course). No extent use in mainspace. However, the COIBot report stops at 2015, but shows a number of possibly empty sections (a possible bug?) Thanks, —[[User:PaleoNeonate|<span style="font-variant:small-caps;color:#44a;text-shadow:2px 2px 3px DimGray;">Paleo</span>]][[User talk:PaleoNeonate|<span style="font-variant:small-caps;color:#272;text-shadow:2px 2px 3px DimGray;">Neonate</span>]] – 22:48, 5 August 2019 (UTC)

==BLP spam - self-published and dangerous source==
* {{link summary|twitter.com/schneiderleonid}}
* {{link summary|Forbetterscience.com}}

Spam on a BLP with some determination, also turns up in other BLPs. <b>[[User Talk:JzG|Guy]]</b> <small>([[User:JzG/help|Help!]])</small> 15:35, 7 August 2019 (UTC)


=Proposed removals=
=Proposed removals=

Revision as of 15:35, 7 August 2019

    Mediawiki:Spam-blacklist is meant to be used by the spam blacklist extension. Unlike the meta spam blacklist, this blacklist affects pages on the English Wikipedia only. Any administrator may edit the spam blacklist. See Wikipedia:Spam blacklist for more information about the spam blacklist.


    Instructions for editors

    There are 4 sections for posting comments below. Please make comments in the appropriate section. These links take you to the appropriate section:

    1. Proposed additions
    2. Proposed removals
    3. Troubleshooting and problems
    4. Discussion

    Each section has a message box with instructions. In addition, please sign your posts with ~~~~ after your comment.

    Completed requests are archived. Additions and removals are logged, reasons for blacklisting can be found there.

    Addition of the templates {{Link summary}} (for domains), {{IP summary}} (for IP editors) and {{User summary}} (for users with account) results in the COIBot reports to be refreshed. See User:COIBot for more information on the reports.


    Instructions for admins
    Any admin unfamiliar with this page should probably read this first, thanks.
    If in doubt, please leave a request and a spam-knowledgeable admin will follow-up.

    Please consider using Special:BlockedExternalDomains instead, powered by the AbuseFilter extension. This is faster and more easily searchable, though only supports whole domains and not whitelisting.

    1. Does the site have any validity to the project?
    2. Have links been placed after warnings/blocks? Have other methods of control been exhausted? Would referring this to our anti-spam bot, XLinkBot be a more appropriate step? Is there a WikiProject Spam report? If so, a permanent link would be helpful.
    3. Please ensure all links have been removed from articles and discussion pages before blacklisting. (They do not have to be removed from user or user talk pages.)
    4. Make the entry at the bottom of the list (before the last line). Please do not do this unless you are familiar with regular expressions — the disruption that can be caused is substantial.
    5. Close the request entry on here using either {{done}} or {{not done}} as appropriate. The request should be left open for a week maybe as there will often be further related sites or an appeal in that time.
    6. Log the entry. Warning: if you do not log any entry you make on the blacklist, it may well be removed if someone appeals and no valid reasons can be found. To log the entry, you will need this number – 909781401 after you have closed the request. See here for more info on logging.


    Proposed additions

    drive.google.com

    Cannot think of a responsible use of this other than for the Google Drive article. I see this being used to using original research or otherwise unreliable sources, or worse for malware/spam distribution.

    Unfortunately, a number of articles are using Google Drive links as references or otherwise. I picked a random article to see how what kind of content was being used - Hyperinflation in Brazil seems to link to original research in the Google Drive link used there.

    I would like to see additional input - I think it isn't a problem to use these in project or userspace, but I would say that 90% of mainspace usage would be problematic. Does the community have any other thoughts? Jon Kolbert (talk) 19:46, 5 July 2019 (UTC)[reply]

    Upon further reflection this would probably be best as an edit filter to limit the blacklist to mainspace and allow extended-confirmed users to use it elsewhere since spam-blacklist is for every namespace. Jon Kolbert (talk) 20:12, 5 July 2019 (UTC)[reply]
    Some of these seem to be historic documents - could they, should they, be transferred to archive.org ? Case in point, the NSA interview transcripts from "Rasterschlüssel 44"? DS (talk) 21:09, 5 July 2019 (UTC)[reply]
    I feel like that would be a more stable, reliable solution than a link to a Google Drive folder whose owner can change the contents at any point in time. Jon Kolbert (talk) 21:48, 5 July 2019 (UTC)[reply]
    'Some of these seem to be historic documents' .. is this google drive the only place where they are available. I would argue that although it is certainly valuable to have a link to an online copy, it is not absolutely needed (as long as you uniquely describe the document). And if they are out of copyright (so really historic) that they could easily be incorporated into WikiSource. --Dirk Beetstra T C 13:32, 7 July 2019 (UTC)[reply]
    Comment I have alerted WP:RSN to this discussion as the above comment relates to reliability. I myself support this proposal for the following reasons:
    1. It probably fails WP:ELNO#11.
    2. Some pages also fail WP:ELNO#3.
    3. It unambiguously fails the reliable sources criteria as user-generated content, and better sources are almost always available. –LaundryPizza03 (d) 22:47, 15 July 2019 (UTC)[reply]
    The assumption that there is a specific kind of source involved is misguided. A public, general-purpose file storage service is ambiguous. Like YouTube, it can be used for reliable sources (primary or secondary) and appropriate external links, or inappropriate ones. This is why WP:YOUTUBE gives caution but says that "Links should be evaluated for inclusion with due care on a case-by-case basis." Just the other day I cited a magazine which distributes its back issues through Google Drive. Is there widespread abuse, compared to similar sites, that would justify the drastic step of blacklisting it? Kim Post (talk) 00:29, 16 July 2019 (UTC)[reply]
    @Kim Post: gauging abuse here is a difficult one. If 10% turns out to be (likely) copyright violations then yes, there is abuse. Abuse in the term of spamming, I don't think so (but then we would not discuss this if that was the case). I agree that the case seems similar to Youtube, but I don't know about the ratios - how many are copyright violations, how many are convenience, how many are not replaceable, etc. (noting that of the material on Youtube that is useful to Wikipedia the percentage of (likely) copyright violations is higher than the overall percentage on Youtube). --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]
    Special:Search/insource:"drive.google.com" shows 2,550 articles currently citing Google Drive. If only 90% of mainspace usage is problematic, it means 255 articles are using Google Drive as a legitimate source, which is too high for blacklisting. If the content of the sources is appropriate, though in the best format, an edit filter showing a warning message, or having a bot to undo additions by new users, is a better approach than blacklisting the link and requiring all uses to be whitelisted. feminist (talk) 01:58, 16 July 2019 (UTC)[reply]
    @Feminist: 'If only 90% of mainspace usage is problematic' .. only? If that 90% of the cases has roughly 20% (likely) copyright violations (the first link I clicked on was link to a personal copy of an article copyrighted by Elsevier where I would consider that this is likely/maybe out of scope of what Elsevier allows, and, obviously, there is a proper link to the proper, albeit paywalled, article) then we are talking hundreds of copyright violations. That is way too high to allow unlimited inclusion (and hence, blacklist might be appropriate). (in short: you would need a full analysis of all, not just eyeballing 10% is fine, for all you know, it is only 1% that is fine, which is something that the whitelist can easily handle). I could however agree with adding this to XLinkBot or an edit filter to step this up and reconsider blacklisting after a couple of months. --Dirk Beetstra T C 04:38, 16 July 2019 (UTC)[reply]

    Very much in tow minds, yes it is no different from any other storage medium, but (as others have pointed out) it might also (as a storage medium have stuff that would pass RS. At this time I lean to no.Slatersteven (talk) 09:24, 16 July 2019 (UTC)[reply]

    @GRuban: by the Devil's advocate: so it is just as likely to contain bad content as the website of the BBC, youtube, Elsevier, or blogger? --Dirk Beetstra T C 20:30, 16 July 2019 (UTC)[reply]
    Respectively, no, yes, no, and yes. The point is that the BBC and Elsevier exercise editorial control. Blogger and YouTube and Google Drive do not. So, yes, most stuff on YouTube and Blogger and Google Drive don't meet our criteria as reliable sources; but some does, so we shouldn't throw the baby out with the bathwater expert self-published opinions out with the overwhelming majority of self published opinion. --GRuban (talk) 21:38, 16 July 2019 (UTC)[reply]
    The BBC are not a storage medium, they are a creator.Slatersteven (talk) 08:54, 17 July 2019 (UTC)[reply]
    The BBC is a creator who stores their info on their own site, many people who are a creator and do not own an own site store it somewhere else, like on youtube, blogspot or on drive.google.com.
    Exactly, Gruban. Blogger, YouTube and Google Drive do not have editorial control, and are generally unreliable. With the first 2 of those we exhibit quite strong editorial control. They are on XLinkBot, and we generally do not hesitate when abuse is so bad that material needs blacklisting (there are several blogger sites on the blacklist, and specific Youtube videos/channels. Other of those 'free storage sites' we have blacklisted, like Hulu, examiner, based on a similar discussion as this one. The question is whether the good material (material that is really needed) outweighs the bad material (rubbish, copyright violations, 'spam', etc.). The point 'no more or less likely to contain good or bad content than any other arbitrary website', this one falls well in the range of blogger, Youtube, Hulu and examiner, and I wonder whether it is just as likely to have bad material as YouTube, or just as likely to have bad material like Examiner (to pick 2). --Dirk Beetstra T C 15:46, 17 July 2019 (UTC)[reply]
    • Support At Google drive they are user-generated content. Yes, some reliable source are offline source and/or behind paywall, but it is not the reason to re-publish them under google drive as pirate copy. Also, wikipedia should not use url that point to those pirate resource. Are there any genuine source that were hosted as Google Drive? Please point it out among 8,932 entries of Google Drive currently in wikipedia as a black swan. Matthew hk (talk) 11:57, 17 July 2019 (UTC)[reply]
    • If you really want to collect anecdotes: out-of-print issues of C3i Magazine, a publication about wargames, are hosted on Google Drive. The official website provides these links. More to the point, the bare fact that a website is open to the public, and so could be used for bad sources or external links, is not a reason to put it on the spam blacklist. Kim Post (talk) 03:14, 18 July 2019 (UTC)[reply]
    • @Kim Post: It is not a question of could. The link used on Zohar e.g. leads to a pdf which looks like the printout of another website. That is very likely a copyright violation, also because I can find the text elsewhere. I saw another example like that earlier but it seems to have been removed. It remains a question of balance between use and abuse, and how much abuse you want to take. And in the area of spam, anything that could be abused eventually will be. --Dirk Beetstra T C 03:45, 18 July 2019 (UTC) (found the links to publications, see below --Dirk Beetstra T C 05:00, 18 July 2019 (UTC))[reply]

    It gets more interesting,

    in e.g. diff, diff, diff &c. links consistently to work by the same authors and in all cases links to a google-drive copy of the work of the authors (not, as is more normal, using doi or a link to the publishing papers). Now looking at the IP (which is in New York University, New York) that does overlap with the stated location of one of the three editors. Looking at the personal copy document, it state:

    • This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues.
      Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited.
      In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository.

    I agree that the copyright status there is a grey area, but this is likely a case of someone promoting their work (i.e. spamming) using this website as the medium for the spam. --Dirk Beetstra T C 05:00, 18 July 2019 (UTC)[reply]

    • At the very least this should be on an edit filter, but actually I think it should be blacklisted. Think about it: a Google Drive document can be updated by the owner at any time, is typically not archived by the archiving services, may be in violation of copyright, and we have no proof that it's an authentic copy of the material even if it's not. Oh, and several document types allow for infection with malware, which typically gets screened out by reputable online sources. Any original papers should be identified by DOI reference not by links to personal copies on file sharing platforms, and academics almost always have the ability to upload to space within their institution's own website. Linking to Google Drive, OneDrive and the like seems like an open invitation for abuse. Guy (Help!) 09:56, 2 August 2019 (UTC)[reply]

    p2pmarketdata.com

    Spamming for a new P2P finance blog (since January 2019) by various SPAs/IPs. The blog includes a referral scheme for cashback boni. Four previous warnings have been ignored. GermanJoe (talk) 05:02, 16 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:57, 27 July 2019 (UTC)[reply]

    AlexMacArthur spam

    Link spam. Anarchyte (talk | work) 15:32, 17 July 2019 (UTC)[reply]

    @Anarchyte: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:59, 27 July 2019 (UTC)[reply]

    wikiwaparz.com

    Recurring spam for a Nigerian download site. Deceptive overwriting of existing valid source links (for example: IP 105.112.39.67). Several warnings have been ignored. GermanJoe (talk) 07:57, 18 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:57, 27 July 2019 (UTC)[reply]

    blockchain.news

    Low-quality cryptocurrency blog. JohnnyBCN started adding reference links, then tried to remove themselves from the WP:GS/Crypto notifications list. Wmbc918 started up immediately after. All instances reverted, but there's no circumstance in which this will ever be a useful reference - David Gerard (talk) 10:36, 18 July 2019 (UTC)[reply]

    @David Gerard: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:51, 27 July 2019 (UTC)[reply]

    dnbamerica.com

    Mirror domain of dnbnumber.com (already blacklisted), spammed in Data Universal Numbering System. Likely fraud (DUNS has an authorized page to apply for such numbers on their own official website), certainly spam - please blacklist. GermanJoe (talk) 06:22, 25 July 2019 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 20:50, 27 July 2019 (UTC)[reply]

    bioexposed.com

    Unreliable, spammy site for BLP information. The user's only edits are adding material sourced to this site. See also Wikipedia:Reliable_sources/Noticeboard#bioexposed.com_as_a_source_in_a_BLP. –LaundryPizza03 (d) 23:30, 27 July 2019 (UTC)[reply]

    @LaundryPizza03: plus Added to MediaWiki:Spam-blacklist. --Guy (Help!) 21:38, 30 July 2019 (UTC)[reply]

    onezorse.com

    Spam additions including reference substitutions. Guy (Help!) 09:50, 2 August 2019 (UTC)[reply]

    @JzG: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 21:55, 4 August 2019 (UTC)[reply]

    youngstownwater.com

    Spammy links added to external links sections and as references - Eureka Lott 20:56, 4 August 2019 (UTC)[reply]

    @EurekaLott: plus Added to MediaWiki:Spam-blacklist. — JJMC89(T·C) 21:54, 4 August 2019 (UTC)[reply]

    Spam for advisory websites

    Recurring spam for two - possibly related - advisory and registration websites (see COIBot reports for more details). Several warnings have been ignored. GermanJoe (talk) 09:50, 5 August 2019 (UTC)[reply]

    The IPs of the servers that the websites seem to run on are very similar. --Dirk Beetstra T C 10:26, 5 August 2019 (UTC)[reply]

    selfgrowth.com

    selfgrowth.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    Appears to have extensively been spammed by many users (too many to list here, but COIBot lists them, of course). No extent use in mainspace. However, the COIBot report stops at 2015, but shows a number of possibly empty sections (a possible bug?) Thanks, —PaleoNeonate22:48, 5 August 2019 (UTC)[reply]

    BLP spam - self-published and dangerous source

    Spam on a BLP with some determination, also turns up in other BLPs. Guy (Help!) 15:35, 7 August 2019 (UTC)[reply]

    Proposed removals


    indianholiday.com

    I found that Wikipedia has been blacklisted indianholiday.com. When I checked the blacklisted list, I found that there was no reason for blacklisting by your administrator or contributor. Indian Holiday is a very reputed website in India for tour and travel service since 1990. This Company has won two-time national tourism awards. The Indian holiday Pvt. ltd company is approved by tourism of India. This site contains unique and useful information, tours to Foreigner Tourist who come to India to explore Indian culture, history, monuments, and cuisine.

    Perhaps someone did a spammy thing. That's why a Wiki administrator blocks the website. Also, I found that Indian Holiday is blacklisted since 2018 march. So, I think indainholiday.com should be delisted. Because this site provides provide a useful query to foreigner tourist. Please look in the matter and help me out with this problem. Adityainfoboy (talk) 09:08, 25 July 2019 (UTC)[reply]

    See Wikipedia_talk:WikiProject_Spam/2008_Archive_Mar_3#Assorted_Indian_spamming and MediaWiki_talk:Spam-blacklist/archives/January_2012#indianholiday.com. ¶ How do links to package tour companies benefit encyclopedia articles? -- Hoary (talk) 10:02, 25 July 2019 (UTC)[reply]
     Not done "Wikipedia does not benefit from links to package tour companies" is the correct answer. Someone did indeed do a "spammy thing." OhNoitsJamie Talk 14:17, 25 July 2019 (UTC)[reply]

    I got your point. But due to the blacklist. I am unable to make a company profile on Wikipedia like (yatra.com and makemytrip.com). Please provide a way so that I could add company information on Wikipedia. And I believe company information on Wikipedia could help people to find out accurate information and service about the company. Adityainfoboy (talk) 06:21, 26 July 2019 (UTC)[reply]

    No. The other companies you mention are large, NASDAQ traded companies, and therefore are likely to satisfy WP:CORP notability, unlike indianholiday, which you are obviously affiliated with. OhNoitsJamie Talk 14:01, 26 July 2019 (UTC)[reply]
    Some small companies merit articles. Are there independent sources for an article on this company? If so, then somebody can use them to make an article on the company. (And if not, then nobody can make an article on the company.) -- Hoary (talk) 14:07, 26 July 2019 (UTC)[reply]

    beautytohealth.com

    Site is featured on scholar.google.com but blacklisted on Wikipedia. The links do not exist on the website and seem to be added by competitors (negative SEO). Consider removing the blacklist. I tried adding genuine links but they keep getting auto removed — Preceding unsigned comment added by 2406:3400:30F:B9D0:74E2:58C:AAA4:944D (talkcontribs)

     Defer to Global blacklist Being "featured" on another site doesn't carry much weight here. Furthermore, it's not locally blacklisted, it's globally blacklisted due to extensive spamming. You'll have to take up the matter there (good luck, as I don't see how that site would meet WP:MEDRS. OhNoitsJamie Talk 14:34, 31 July 2019 (UTC)[reply]
    Oh, I notice that your IP address geolocates to the the same Australian city as one of the IPs that previously spammed the site. Maybe a coincidence, but you may also want to read WP:COI. OhNoitsJamie Talk 14:37, 31 July 2019 (UTC)[reply]

    We are in 2019 and IP addresses can be spoofed.

    Thanks for that update. Good luck on your endeavors. I'll keep an eye out for requests on meta in case anyone needs further information.OhNoitsJamie Talk 14:58, 31 July 2019 (UTC)[reply]

    econlib.org

    econlib.org: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    This link appears to have been grouped in with spammy immigration law additions? [[1]] in a March 2017 addition to the Local blacklist: [[2]]

    Maybe the spamvertizers used this link also in some of their posts...but this link is to the real website for this organization: The Library of Economics and Liberty It may be a biased reference, but it is currently cited in one article: Wage share.

    I'm guessing this was a false positive? ---Avatar317(talk) 23:11, 5 August 2019 (UTC)[reply]

    @Avatar317: no, the spamvertizers (declared paid editing ring) also had a close connection to the subject econlib and edited and created pages related to econlib. Maybe not part of the paid editing, but definite conflict of interest. FYI, except for the encyclopedia (which is whitelisted) all information is full text available from other libraries, and in many cases even from WikiSource. --Dirk Beetstra T C 04:09, 6 August 2019 (UTC)[reply]
    • Avatar317, I fixed this. It was an absolutely standard example of the genre: a public domain work, linked to the right-wing think tank, and listed as being published by them. That last bit of deceptive attribution is very common, I have found and removed hundreds, often to very well known works like Gibbon's Decline And fall Of The Roman Empire. In this case the full text is available at Gutenberg, and the publisher is not the Orwellian-titled "Library of Economics and Liberty" but John Murray, of London. I don't think there were many right-wing think tanks operating in 1817, and this one certainly wasn't. Guy (Help!) 07:18, 6 August 2019 (UTC)[reply]

    Logging / COIBot Instructions

    Blacklist logging

    Full instructions for admins


    Quick reference

    For Spam reports or requests originating from this page, use template {{/request|0#section_name}}

    • {{/request|213416274#Section_name}}
    • Insert the oldid 213416274 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.

    For Spam reports or requests originating from Wikipedia_talk:WikiProject_Spam use template {{WPSPAM|0#section_name}}

    • {{WPSPAM|182725895#Section_name}}
    • Insert the oldid 182725895 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.
    Note: If you do not log your entries, it may be removed if someone appeals the entry and no valid reasons can be found.

    Addition to the COIBot reports

    The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

    1. first number, how many links did this user add (is the same after each link)
    2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
    3. third number, how many times did this user add this link
    4. fourth number, to how many different wikipedia did this user add this link.

    If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user who adds a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. This data is available in real-time on IRC.

    Poking COIBot

    When adding {{LinkSummary}}, {{UserSummary}} and/or {{IPSummary}} templates to WT:WPSPAM, WT:SBL, WT:SWL and User:COIBot/Poke (the latter for privileged editors) COIBot will generate linkreports for the domains, and userreports for users and IPs.


    Discussion

    yupptv.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com I think this is not a spam link. This is the official website of YuppTV. So, please coonsider it to remove from spam list.--Themanlk (talk) 01:35, 28 July 2019 (UTC)[reply]