Jump to content

MediaWiki talk:Spam-blacklist

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by GermanJoe (talk | contribs) at 10:37, 10 March 2020 (→‎spdload.com / webspero.com: Added to Blacklist using SBHandler). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Mediawiki:Spam-blacklist is meant to be used by the spam blacklist extension. Unlike the meta spam blacklist, this blacklist affects pages on the English Wikipedia only. Any administrator may edit the spam blacklist. See Wikipedia:Spam blacklist for more information about the spam blacklist.


    Instructions for editors

    There are 4 sections for posting comments below. Please make comments in the appropriate section. These links take you to the appropriate section:

    1. Proposed additions
    2. Proposed removals
    3. Troubleshooting and problems
    4. Discussion

    Each section has a message box with instructions. In addition, please sign your posts with ~~~~ after your comment.

    Completed requests are archived. Additions and removals are logged, reasons for blacklisting can be found there.

    Addition of the templates {{Link summary}} (for domains), {{IP summary}} (for IP editors) and {{User summary}} (for users with account) results in the COIBot reports to be refreshed. See User:COIBot for more information on the reports.


    Instructions for admins
    Any admin unfamiliar with this page should probably read this first, thanks.
    If in doubt, please leave a request and a spam-knowledgeable admin will follow-up.

    Please consider using Special:BlockedExternalDomains instead, powered by the AbuseFilter extension. This is faster and more easily searchable, though only supports whole domains and not whitelisting.

    1. Does the site have any validity to the project?
    2. Have links been placed after warnings/blocks? Have other methods of control been exhausted? Would referring this to our anti-spam bot, XLinkBot be a more appropriate step? Is there a WikiProject Spam report? If so, a permanent link would be helpful.
    3. Please ensure all links have been removed from articles and discussion pages before blacklisting. (They do not have to be removed from user or user talk pages.)
    4. Make the entry at the bottom of the list (before the last line). Please do not do this unless you are familiar with regular expressions — the disruption that can be caused is substantial.
    5. Close the request entry on here using either {{done}} or {{not done}} as appropriate. The request should be left open for a week maybe as there will often be further related sites or an appeal in that time.
    6. Log the entry. Warning: if you do not log any entry you make on the blacklist, it may well be removed if someone appeals and no valid reasons can be found. To log the entry, you will need this number – 944858872 after you have closed the request. See here for more info on logging.


    Proposed additions

    credit(s)karma

    Too close for comfort. --Dirk Beetstra T C 09:57, 17 February 2020 (UTC)[reply]

    Hmm ...

    And then:

    Seems all very spammy. --Dirk Beetstra T C 10:01, 17 February 2020 (UTC)[reply]

    @Beetstra: plus Added to MediaWiki:Spam-blacklist. Yes, this has a strong whiff of processed meat products. Bloody Vikings. --Guy (help!) 20:15, 19 February 2020 (UTC)[reply]

    taylorinterventions.com

    Typical hijacked domain/malware redirect.— Preceding unsigned comment added by Pkurzweil (talkcontribs) 01:09, 18 February 2020 (UTC)[reply]

    @Pkurzweil: plus Added to MediaWiki:Spam-blacklist. Confirmed, domain hijacked, malware. --Guy (help!) 20:19, 19 February 2020 (UTC)[reply]

    1youm.com

    Spammers

    Five sockpuppets so far today. - MrOllie (talk) 15:55, 19 February 2020 (UTC)[reply]

    @MrOllie: plus Added to MediaWiki:Spam-blacklist - obvious spam and socking. Last 2 accounts blocked too. --GermanJoe (talk) 17:46, 19 February 2020 (UTC)[reply]
    And the first three are globally locked, so that removes any doubt. Should this also be on the Meta blacklist? Guy (help!) 20:17, 19 February 2020 (UTC)[reply]
    14 of 16 additions have been on en-Wiki. Unless I am missing some background history behind the global locks (an LTA or something similar), I don't think it's necessary. GermanJoe (talk) 20:12, 20 February 2020 (UTC)[reply]

    dailyhunt.in

    There is virtually no reason this site should be used as it's almost exclusively an aggregate publisher and it very often picks up items from unreliable, blackhat SEO "news" sources. example, see the disclaimer at the bottom. In the event that they do publish something as an aggregate that isn't from a non-rs, the rs should just be used. Praxidicae (talk) 19:52, 19 February 2020 (UTC)[reply]
    @Praxidicae: plus Added to MediaWiki:Spam-blacklist. Agreed, a net negative to Wikipedia, sufficiently so that addition invites questions about the good faith of the user linking it. --Guy (help!) 20:13, 19 February 2020 (UTC)[reply]

    "duleweboffice"

    Shamelessly nicked from User:Praxidicae/fakenews


    This set all belongs to a gmail account "duleweboffice@gmail.com" and several of thes sites, including foreignpolicyi.org were originally legitimate sites however they sniped the domain and it has since become an unreliable and frankly garbage spam site (as is the case for the rest, too.) Legitimate uses of this link look like: http://www.foreignpolicyi.org/node/17539 and we should see if there is an archived version somewhere. The illegitimate uses look like this and are rather easy to spot (basically anywhere this is used on entertainment, media personalities and media in general is the spam version.) the spam variant looks like this: https://foreignpolicyi.org/tanya-nolan-is-becoming-a-hit-with-new-single-love-ya/


    I did some checking. These sites have been abused on Wikipedia, in some cases severely so. It's hard to conclude anything other than SEO involvement. I salute Praxidicae for this hard work. If we blacklist then at least no new links will be added, and old ones will be nuked as the articles are edited. It's a huge job removing them entirely. Lustiger seth, is there any way to write a bot to copy the contents of the blacklist and compile a table with the number of active links on enWP, ideally just in mainspace? We might be able to use that as the basis for a reward system for Wikignomes. Guy (help!) 20:35, 19 February 2020 (UTC)[reply]

    • Just a quick note that I archived a bunch of foreignpolicyi.org links (to the original site) and deleted all traces to the original, so those are fine but should be blacklisted going forward. Same for vermontrepublic. The rest are just plain old spam and can be blacklisted unless we would rather filter as a honeypot. There's another set by the same person/email (duleweboffice) under the name "santosmilewa"example and "kravitzcj" example. I'll make a list of these shortly. They're all operated by the same 3 blackhat SEO firms along with another handful that are using a dead woman's identity (I filed actual reports with the proper agencies about this FWIW), a fake phone number and a fake real life address (it's public, so i'm not disclosing anything out of the ordinary.) Anyhow, my lists are kind of a mess right now so I'll throw some together over the next few hours/days that'll make it all easier. Praxidicae (talk) 21:13, 19 February 2020 (UTC)[reply]
      Praxidicae, heroic work, thanks. Guy (help!) 21:19, 19 February 2020 (UTC)[reply]
    @Praxidicae/fakenews: plus Added to MediaWiki:Spam-blacklist. per User:Praxidicae/fakenews/sbl. --Guy (help!) 21:28, 19 February 2020 (UTC)[reply]
    hi!
    regarding the question about the table: it would be possible, but it would take long time (weeks or months), i guess. (and i would need some time to adapt my scripts. that's propably the bottleneck.) -- seth (talk) 16:56, 23 February 2020 (UTC)[reply]
    The domain scholarlyoa.com was Beall's list. Articles on dodgy academic publishing practices are likely to point to archived copies of it. I discovered the problem when trying to revert section-blanking at World Academy of Science, Engineering and Technology; the last good version had links that are now spam-blacklisted. Not being too familiar with how spam-blacklisting works around here, I'm not sure of the best course of action. (Ping JzG.) XOR'easter (talk) 13:29, 25 February 2020 (UTC)[reply]
    XOR'easter, request whitelisting of specific URLs. Are you familiar with that process? I can help if not. Guy (help!) 15:52, 25 February 2020 (UTC)[reply]
    JzG, I'm not sure the links should just be whitelisted, since the site itself is down, probably permanently, and the actual content we should be pointing readers to is in the archived copies. XOR'easter (talk) 20:11, 25 February 2020 (UTC)[reply]
    @JzG: scholarlyoa being blacklisted is really annoying, and prevents many discussions about predatory journals. It should also be limited to non-archived links, since those are the problematic ones, rather than archived links, which refer to the legitimate site back when Jeffrey Beall ran it. Headbomb {t · c · p · b} 07:19, 27 February 2020 (UTC)[reply]
    Headbomb, No, it does not prevent that, it makes it more difficult as you now cannot link to it directly but have to disable the links when discussing them (which is highly annoying) . Unfortunately the AbuseFilter is not a reasonable alternative either, it is too heavy handed for this. That is indeed a shortcoming of the spam-blacklist and of the AbuseFilter. Do these discussions happen so often? Dirk Beetstra T C 10:04, 27 February 2020 (UTC)[reply]
    I think that such problems (hijacked domains that once had legitimate use) will become more common as spammers become more sophisticated and the use of the spam blacklist expands. I dunno, is it possible to selectively whitelist archived versions of a blacklisted URL? I know that in its current form the spam blacklist catches archived versions of a blacklisted link as well. Jo-Jo Eumerus (talk) 11:52, 27 February 2020 (UTC)[reply]
    It does not prevent that. It does. And they happen often enough on journals-related pages, given Beall's importance in that area. This is best implemented as an edit filter, which would not interfere with non-article space, such as talk spaces. Headbomb {t · c · p · b} 14:45, 27 February 2020 (UTC)[reply]
    Headbomb, it does not prevent discussing, it prevents linking to the material directly (which, I totally agree, is completely annoying). The problem here is, that there is no other solution: allow only the archive links also allows the archive links to current material, and does so everywhere everywhere (which is exactly what you don't want to be linking to). Removing it from the blacklist altogether also allows the current material, and also everywhere (as above). It is a shortcoming of the spam blacklist. It needs to be changed (well, it needed to be changed years ago). Currently your only way forward is either using it in a 'broken' form (if it is talkpage only likely the best way forward), or getting it whitelisted (not very likely to be granted for use on a talkpage). Everything else is something that needs a change to the software. Note also, that the edit filter needs to be rather heavy to be less restrictive than the spam-blacklist. Dirk Beetstra T C 12:32, 1 March 2020 (UTC)[reply]
    Headbomb, Note: the link you were all trying to add (which is the only one every hitting the filter in the whole so many years of history on that page) is one that is used in the article. Hence, it should be whitelisted just to make sure that we do not run into problems in the future. Dirk Beetstra T C 12:39, 1 March 2020 (UTC)[reply]

    rocketrobinsoccerintoronto.com

    Per RSN, requesting blacklisting of a frequently abused blog to prevent further additions. Wikipedia appears to be the victim of effectively resume padding by this site, and it would seem that we should stop it. Guy (help!) 20:12, 19 February 2020 (UTC)[reply]

    plus Added to MediaWiki:Spam-blacklist in the absence of any dissent. --Guy (help!) 13:16, 23 February 2020 (UTC)[reply]

    WikiLeaks

    I came across some citations to WikiLeaks. That seems like a really bad idea: pretty much by definition the material they host is in violation of copyright. Guy (help!) 13:05, 20 February 2020 (UTC)[reply]

    As I understand it, the material they host was produced by governments and is not copyrighted.
    There is a possible problem in linking to information that governments consider classified. When I was a defense department employee, we couldn't even look at Wikileaks (even on personal time) due to the danger of being exposed to classified information we weren't cleared to know, which is a serious thing if you're in government. It was a weird situation where the public could do what they wanted but those of us in government service had restrictions. That was years ago; I don't know how they handle it these days.
    To the extent that government documents are reliable sources, citing such documents on Wikileaks should not be a problem if that's the only venue where they can be seen. ~Anachronist (talk) 05:56, 24 February 2020 (UTC)[reply]
    Anachronist, I'm not sure I understand .. is material from a government not copyrighted? I would expect that the organisation (not the individual that wrote it) holds the copyright.
    Though I agree that some of the material can be a reliable source, there is also not a necessity to have a working link to the information (if too much of the info is problematic linking to). Dirk Beetstra T C 06:01, 24 February 2020 (UTC)[reply]
    @Beetstra: I'll answer your question with the lead sentence of our article Copyright status of works by the federal government of the United States. If the communique, document, or other work was written by a government employee, it isn't subject to domestic copyright, but if the work was written by a contractor the situation is muddier. I'd wager that most of the documents on Wikileaks are generated by governments (largely the US government) and therefore not subject to copyright.
    I oppose blacklisting Wikileaks, but if we don't, then citations to it would have to be examined on a case-by-case basis. ~Anachronist (talk) 17:02, 24 February 2020 (UTC)[reply]
    Anachronist, not unless you consider material stolen from the DNC's email servers by the Russians to be "produced by governments". Also British government materials are Crown copyright. So there's absolutely no guarantee. And work product is exempt, I believe. Guy (help!) 19:40, 24 February 2020 (UTC)[reply]
    The DNC stuff isn't produced by governments, of course. I'm thinking more of US military messages, diplomatic communiques, stuff that Chelsea Manning released, and so on. I'm skeptical that government work products are exempt. There's legitimate material in there, and as I said, the citations would need to be examines on a case-by-case basis.
    I note that [link search] reveals an extremely low percentage of Wikileaks links in main article space. Most of them appear to be on talk pages and Wikipedia namespace. I wish the linksearch feature had a filter to show only mainspace pages. Glancing through it, there don't seem to be many articles actually citing Wikileaks. ~Anachronist (talk) 04:33, 25 February 2020 (UTC)[reply]
    Anachronist, did you look through wikileaks.org HTTPS links HTTP links? Guy (help!) 15:54, 25 February 2020 (UTC)[reply]
    Cool. I didn't know about that search parameter. I stand corrected. :) ~Anachronist (talk) 17:13, 25 February 2020 (UTC)[reply]

    iwmbuzz

    This site has been the subject of several discussions - it's a wiki, it has no editorial standards and it's primary use is on Indian films, where the Indian Film Task Force has determined it should never be used. However, due to a lack of enforcement, there are hundreds of instances of it. It's nothing more than spam and should be blacklisted. Praxidicae (talk) 13:12, 20 February 2020 (UTC)[reply]
    Praxidicae, I can only find Talk:Silsila_Badalte_Rishton_Ka#About_reference_(iwmbuzz.com) as a reference for ‘never’ ... is there a wider discussion (RSN?) regarding this source? Dirk Beetstra T C 19:57, 24 February 2020 (UTC)[reply]
    I'll do some digging but I don't know that one is even needed. It's just a wiki (IndianWikiMedia + buzz) so there's really no legitimate reason it should be used. Praxidicae (talk) 19:59, 24 February 2020 (UTC)[reply]
    Blacklisting based on reliability only is generally not done lightly. But then, there is this ..... —Dirk Beetstra T C 20:40, 24 February 2020 (UTC)[reply]

    alltoppro.com / easyshoptips.com

    Review blog spam, deceptive overwriting of existing source links. Continued after a first block. GermanJoe (talk) 16:15, 21 February 2020 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --GermanJoe (talk) 16:16, 21 February 2020 (UTC)[reply]

    stadimiz.com

    Batman Arena is the article I found this in. It open under a tab, you have to enter the address in a clean window. Messed up stuff. They force you to go to Google Play and install an app, or to change your search engine. This is a dangerous way to do business and has no place as a "reliable source", or any place in the encyclopedia. Dennis Brown - 20:46, 23 February 2020 (UTC)[reply]

    Dennis Brown, this seems to be cross wiki .. Dirk Beetstra T C 05:26, 24 February 2020 (UTC)[reply]
    posted on meta: m:Talk:Spam_blacklist#stadimiz.com. --Dirk Beetstra T C 05:47, 24 February 2020 (UTC)[reply]
    Thank you. Dennis Brown - 02:29, 25 February 2020 (UTC)[reply]

    gk4fast.in

    Mass spam by 42.106.100.37/19. See Wikipedia:WikiProject_Spam/LinkReports/gk4fast.in. ~ ToBeFree (talk) 23:56, 25 February 2020 (UTC)[reply]

    plus Added to MediaWiki:Spam-blacklist. --~ ToBeFree (talk) 23:59, 25 February 2020 (UTC)[reply]

    gjonmarkagjoni.com

    plus Added to MediaWiki:Spam-blacklist. Useless website, lots of additions by this user. Honestly, going through process is only going to result in wasted effort. Guy (help!) 21:52, 29 February 2020 (UTC)[reply]

    belgaumtrend.site

    Spam-only website. ~ ToBeFree (talk) 14:20, 1 March 2020 (UTC)[reply]

    plus Added to MediaWiki:Spam-blacklist. ~ ToBeFree (talk) 14:22, 1 March 2020 (UTC)[reply]

    Datanet India Pvt. Ltd

    A long-term spamming problem (see also Wikipedia talk:WikiProject Spam/2007 Archive Jul 2), systematic recurring spamming (and occasional good-faith misuse) of a non-reliable data aggregator and "research" website. GermanJoe (talk) 17:55, 2 March 2020 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --GermanJoe (talk) 17:56, 2 March 2020 (UTC)[reply]

    ymail.info

    Garbage site recently spammed by multiple IPs, three logged so far. plus Added OhNoitsJamie Talk 13:26, 3 March 2020 (UTC)[reply]

    @Ohnoitsjamie:  Defer to Global blacklist, cross-wiki problem. --Dirk Beetstra T C 13:55, 3 March 2020 (UTC)[reply]
    @Ohnoitsjamie: Handled on meta. --Dirk Beetstra T C 13:58, 3 March 2020 (UTC)[reply]
    @Beetstra: Should it be removed from en? OhNoitsJamie Talk 14:02, 3 March 2020 (UTC)[reply]
    @Ohnoitsjamie: up to you, it can be as it is now global'd. — billinghurst sDrewth 02:02, 8 March 2020 (UTC)[reply]

    tripbibo.com

    Spam by confirmed and suspected sockpuppets. ~ ToBeFree (talk) 04:41, 4 March 2020 (UTC)[reply]

    plus Added to MediaWiki:Spam-blacklist. ~ ToBeFree (talk) 04:42, 4 March 2020 (UTC)[reply]

    Vietnamese websites

    New links are appearing as fast as I can remove them. NinjaRobotPirate (talk) 23:26, 7 March 2020 (UTC)[reply]

    NinjaRobotPirate
    Lets see if we have all links by these users Dirk Beetstra T C 05:21, 8 March 2020 (UTC)[reply]

    spdload.com / webspero.com

    Recurring spam for marketing sites / PR blogs, multiple warnings for each. GermanJoe (talk) 10:34, 10 March 2020 (UTC)[reply]

    @GermanJoe: plus Added to MediaWiki:Spam-blacklist. --GermanJoe (talk) 10:37, 10 March 2020 (UTC)[reply]

    Proposed removals

    U.S. tax code edit

    Hello there! I was looking at some articles on the U.S. tax code, I stumbled upon an article that needed to be edited so I did right? Turns out someone else other than our site administrators are using our links on various websites including adult and spammy websites for negative SEO purposes to make it look bad for Google. I don't know how it got in Wikipedia but I believe it was on articles related to either tax or finance. I checked the log but our website "futufan.com" was nowhere to be found. By no means, we are not related to the addition of our website's pages on Wikipedia. While we do not plan on including our links on Wikipedia as long as it is 100 percent necessary, it would be the just thing to do if it was removed from the blacklist. Aligraying (talk) 07:52, 2 March 2020 (UTC)[reply]

    @Aligraying:  Not done, actually, it is not even blacklisted. Do note that it is not our business who is adding those links, our business is to protect Wikipedia against unsolicited additions. --Dirk Beetstra T C 10:54, 2 March 2020 (UTC)[reply]
    @Dirk Beetstra: Are you sure though? I edited the Standard Deduction article for the 2020 updates and added our dedicated article to it which explains the changes the taxes will be paid in 2020 but it wouldn't let me because of site was blacklisted. I'm confused now that you're telling me it isn't blacklisted. I took a screenshot of it when I saw the error to show my staff. Can I provide the link here from Imgur? Aligraying (talk) 07:23, 3 March 2020 (UTC)[reply]
    @Beetstra: Actually it is blacklisted here, and rather recently too, judging by its position a half-dozen lines from the bottom of the list.
    @Aligraying: We generally don't consider delisting requests from individuals associated with a listed site. If a trusted high-volume editor deems the link worthy of including in an article, we will consider it. ~Anachronist (talk) 07:37, 3 March 2020 (UTC)[reply]
    @Anachronist: ST47 added it, and seen how it was added that was the proper action to protect the encyclopedia (though maybe they should be trouted for not logging it :-) ). Aligraying I find your argument highly unconvincing: 7 editors have been adding this over 19 days, at, on average ~3 days intervals. Then you come, 3 days later and try to add it. That is too much of a coincidence.
    Actually, I do think that ST47 has been a bit hasty here, this should have been blacklisted globally: Rejected and  Defer to Global blacklist. --Dirk Beetstra T C 11:06, 3 March 2020 (UTC)[reply]

    Can't you guys check the IP addresses of the people who added our links to the site? Whoever put our links on Wikipedia clearly isn't associated with us. This was obviously for negative SEO purposes. But yeah. Anyhow, this isn't right but it is understandable on you guys' end. Aligraying (talk) 08:26, 3 March 2020 (UTC)[reply]

    Aligraying, no. And it would not matter anyway. It was spammed, and the usual source is the SEO companies employed by the website. Choose more wisely in future. Guy (help!) 10:29, 3 March 2020 (UTC)[reply]

    stmarks-cardiff.co.uk

    I don't understand why this is blacklisted unless it is because other sites with cardiff.co.uk in the name are blacklisted. ActiveRetired (talk) 13:36, 3 March 2020 (UTC)[reply]

    May be a false positive caught by the blacklisting for cardiff.co.uk. Dirk Beetstra, per this comment, looks like we may need to modify the listing. OhNoitsJamie Talk 13:44, 3 March 2020 (UTC)[reply]
    @ActiveRetired: It is easier to whitelist this one:  Defer to Whitelist. Do you mind requesting it there? --Dirk Beetstra T C 14:00, 3 March 2020 (UTC)[reply]

    10bet.com

    10bet.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

    I've been working on articles related to casino and games and recently created this article (10bet). Turns out the official site is blacklisted. I couldn't find the reason in the blocklog. Lunar Clock (talk) 22:07, 6 March 2020 (UTC)[reply]

    hi!
    this domain is blacklisted at meta (for all wikipedias and not just for enwiki): you may consider to ask for unblocking at m:Talk:Spam_blacklist.
    the domain was blacklisted in 2007.[1], as far as i see, a reason has not been given. -- seth (talk) 22:29, 6 March 2020 (UTC)[reply]
    See  Defer to Whitelist (if it hasn't already been whitelisted for 10bet) - [2] OhNoitsJamie Talk 22:33, 6 March 2020 (UTC)[reply]
    I've made request at the meta. Thank you. Lunar Clock (talk) 22:19, 7 March 2020 (UTC)[reply]

    econlib.org

    I have added a reference to an article from a 1945 economic journal in the pricing signal wiki article. I used the journal template, I then added a URL to the site that hosted the article. The website that hosts the article appears to be blacklisted. I just removed the URL, since the reference is valid and points to a historic article that exists outside the web.

    My goal was to find the reason for deletion. My initial impression was that perhaps econlib had credibility issues, maybe there had been an edit war, or it was recurrently added with poor regards to wikipedia's standards.

    However, what I found in the arhives is a very weak reason for blacklisting:

    "All appear to have been added by Vipul (talk · contribs · logs · edit filter log · block log), Riceissa (talk · contribs · logs · edit filter log · block log) or other paid surrogates of Vipul, most are selling legal services related to immigration, and the overall conclusion is SEO. All valid content can be drawn from more authoritative sources such as law books, pages in law faculty websites, official government sources etc. Guy (Help!) 23:24, 10 March 2017 (UTC)"

    Another user comments: "I think econlib.org needs to be removed. This is a legitimate, relatively prominent Economics blog where relatively prominent economists(Sumner, Bryan Caplan) discuss current issues in econ. Dark567 (talk) 01:31, 11 March 2017 (UTC) "

    The concern was heard but nothing was done about it.

    This single user resulted in the blacklist of 8 or 10 sites and econlib appeared to be caught in the fire. I'm not going to bother searching what the particular edits were.

    In addition to removing the individual site from the blacklist, there must be other false positives, so I propose some systemic improvements that can be made.

    First I'll list the steps I took to try to solve this issue:

    1. Check in both the local and global blacklists.
    2. navigate through the archives to find the reason for the ban.
    3. Post here about it.

    Between 2 and 3, finding the relevant edits would be a useful tool for determining whether the ban was warranted, and would be of significant difficulty. So far I would like to see a search tool that:

    • looks in both local and global banlists
    • finds the reason for blacklisting and links the user towards it.
    • is triggered upon an edit that includes the url, and shows the reason to the user.
    • Remove the restriction from talk pages, since it makes it hard to even discuss these blacklists. (it's /library/Essays/hykKnw.html )
    • log attempted edits that include this url. If 1 user made a low quality or vandal edit to wikipedia with that link, but 50 users attempted to make 50 good edits with that link, the url should automatically be suggested for removal.

    The benefits would be:

    • Decreasing references without free access to the cited content.
    • Decreasing biases in wikipedia.
    • Decreasing the amount of erroneously rejected edits.
    • Better communicating to users why their edit was rejected.

    The costs of this edit would be:

    • Development time, I'm free and have the skills to implement this.
    • Admin time, this might increase the maintenance of the blacklist.

    So if I get backing from those currently responsible of going through this list , I can post this suggestion to the appropriate section and start working on it. — Preceding unsigned comment added by TZubiri (talkcontribs) 23:06, 8 March 2020 (UTC)[reply]

    @TZubiri: again, a) there is a direct connection between the paid editor you are talking about and Bryan Caplan, and b) by far most of the material is replaceable (outside of the (already whitelisted) encyclopedia there will be very, very few exceptions, and most of the cases we have seen until now are replaceable with links to e.g. WikiSource. Repeated requests on the whitelist have shown the latter.
    The material that was the problem has been deleted, and is visible only to admins. It has however been explained over and over. See my first paragraph. Removing this needs a consensus (which you are free to gather) showing that the site is absolutely needed. The benefits you are quoting are hardly true:
    • most references have free access to the cited content, if it is freely hosted on econlib, it is likely freely hosted elsewhere as it is out of copyright protection (up to, often, WikiSource).
    • that argument is totally useless. The use of references is not affecting the bias. And as you have referenced it now it is totally unbiased and properly referenced. Even better, it is plainly referenced to the official source of the information. Everyone can find the reference is they want to.
    • The edits are not erroneously rejected, someone with a vested interest was editing this, it is rightfully blacklisted.
    • You just have to ask. It is less than 6 hours and you have an answer.
    There is not a lot to develop, we have the searchbox above, and this track that shows you where this was discussed and gives above explanation several times. And hence, it does not increase admin time.
    no Declined,  Defer to Whitelist for specific links on this domain (but this one is freely accessible elsewhere, even on 'neutral' servers, so I would not bother). --Dirk Beetstra T C 06:07, 9 March 2020 (UTC)[reply]

    Thanks for looking into this. I wasn't aware of the direct link of the infractions to the site owners. Since in this case econlib is functioning as a content host for a widely available primary source, I linked to an alternate host. I still think the user interface can be improved, perhaps a summarized reason for the blacklist could be provided in the rejection message along with the search results of the searchbox you reference in case users want to dig deeper, you'll apologize if users miss this but there's an overload of information for regular users. I'm interested in your perspective on this idea:

    What would you think would be a good message for users to see when they link to econlib?
    If you had the capacity to do so, would you send different messages to users depending whether econlib is being cited as a primary or secondary source? 
    

    Out of curiousity, is it technically possible to blacklist a website as a secondary source but still allow it to work as a host for primary sources? --TZubiri (talk) 19:19, 9 March 2020 (UTC)[reply]

    Logging / COIBot Instructions

    Blacklist logging

    Full instructions for admins


    Quick reference

    For Spam reports or requests originating from this page, use template {{/request|0#section_name}}

    • {{/request|213416274#Section_name}}
    • Insert the oldid 213416274 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.

    For Spam reports or requests originating from Wikipedia_talk:WikiProject_Spam use template {{WPSPAM|0#section_name}}

    • {{WPSPAM|182725895#Section_name}}
    • Insert the oldid 182725895 a hash "#" and the Section_name (Underscoring_spaces_where_applicable):
    • Use within the entry log here.
    Note: If you do not log your entries, it may be removed if someone appeals the entry and no valid reasons can be found.

    Addition to the COIBot reports

    The lower list in the COIBot reports now have after each link four numbers between brackets (e.g. "www.example.com (0, 0, 0, 0)"):

    1. first number, how many links did this user add (is the same after each link)
    2. second number, how many times did this link get added to wikipedia (for as far as the linkwatcher database goes back)
    3. third number, how many times did this user add this link
    4. fourth number, to how many different wikipedia did this user add this link.

    If the third number or the fourth number are high with respect to the first or the second, then that means that the user has at least a preference for using that link. Be careful with other statistics from these numbers (e.g. good user who adds a lot of links). If there are more statistics that would be useful, please notify me, and I will have a look if I can get the info out of the database and report it. This data is available in real-time on IRC.

    Poking COIBot

    When adding {{LinkSummary}}, {{UserSummary}} and/or {{IPSummary}} templates to WT:WPSPAM, WT:SBL, WT:SWL and User:COIBot/Poke (the latter for privileged editors) COIBot will generate linkreports for the domains, and userreports for users and IPs.


    Discussion

    I have some words to say: I tried to add in two sources, but there's error due to this ' Spam blacklist'. How should I settle this? — Preceding unsigned comment added by Manwë986 (talkcontribs) 14:04, 27 February 2020 (UTC)[reply]

    Logs

    Is there a log of hits for the blacklist? Some of the entries have been on here for years, and it might be worth reviewing and removing anything with no hits in two years, to keep the blacklist from blowing up. Guy (help!) 10:15, 5 February 2020 (UTC)[reply]

    @JzG: there is enough material on there that simply should never be taken off, even if it hasn't been hit in two years. Blowing up however would be a good thing, so maybe 14 year old bugs are finally going to be solved. You know, if something is not broken, develop something else that will break it.</sarcasm> --Dirk Beetstra T C 11:20, 5 February 2020 (UTC)[reply]
    User:Lustiger seth however has been cleaning up sometimes on meta removing things. Some domains can be removed because they have now a new owner, or have cleaned up their act. --Dirk Beetstra T C 11:22, 5 February 2020 (UTC)[reply]
    Beetstra, sure, but there are a lot of sites added after brief spamming sprees - often by meatbots - where the risk is probably over. I wonder if we could at least check whether older sites are still online, using a bot? I they are 404 or domain parked, we could probably remove them. Guy (help!) 11:32, 5 February 2020 (UTC)[reply]
    JzG, Ls should have that script/bot. Situation is somewhat complex but there will for sure be material that can be cleaned up. Dirk Beetstra T C 11:36, 5 February 2020 (UTC)[reply]
    Beetstra, see removal requests above for a possible quick win - I ran a DNS lookup script. A sample of around 100 manual checks has yielded no false positives. Guy (help!) 14:58, 5 February 2020 (UTC)[reply]
    Hi!
    I could create a list of non-hitting entries (for the last ~5? years). Afterwards we should remove url shorteners (and some other exceptions?) from the list. Then we could decide whether the remaining entries should be removed from the black list.
    imho this is almost indipendent from the content of the webpages. -- seth (talk) 22:59, 6 February 2020 (UTC)[reply]
    Lustiger seth, Fantastic! Yes, please. Guy (help!) 14:49, 9 February 2020 (UTC)[reply]
    I started a bot run to collect all data. This will take a while. Maybe next weekend I can create a page with some results. -- seth (talk) 18:19, 9 February 2020 (UTC)[reply]
    Lustiger seth, heroic work, thanks. Guy (help!) 12:32, 11 February 2020 (UTC)[reply]
    As a start: User:Lustiger_seth/sbl_log_2013--2020. this is not yet finished. it takes about 6--7 minutes per blacklist entry (and there are ~8,2k of them) to search the whole sbl log table (which has about 100M rows). -- seth (talk) 10:36, 15 February 2020 (UTC), 21:19, 15 February 2020 (UTC)[reply]

    Subsection for \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

    The very first entry is quite interesting: \b^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b: this is a contradiction. it will never match any link addition because an url never starts with a digit, it starts with a protocol (e.g. http or https). so ^\d will fail -- always. this entry is just superflous. the question is: should it be deleted or should it be fixed (by replacing the \b^ with (?<=//)). the latter would require us to look for all ip urls and check them. -- seth (talk) 10:43, 15 February 2020 (UTC)[reply]

    @Lustiger seth: can we figure out who added that and what they had in mind at that time? --Dirk Beetstra T C 05:40, 16 February 2020 (UTC)[reply]
    Well, it was here by User:Reaper Eternal. It appears they only wanted to blacklist IPs. I can see good cause for that (I have seen it being used as blacklist evasion). But I am afraid there are quite some IPs on Wikipedia and (quite) some of them might be genuine-ish. --Dirk Beetstra T C 05:50, 16 February 2020 (UTC)[reply]
    hi!
    right now we have only 4 blacklisted explicit ip addresses in the sbl. so i guess, we could just remove the entry and continue blacklisting single ip-adresses.
    another solution would be to correct the general blacklist entry and whitelist explicit ip adresses. -- seth (talk) 11:26, 16 February 2020 (UTC)[reply]
    Lustiger seth, it is very difficult to determine in how far IP addresses are an issue. I do know one issue if we block all IP addresses .. User:COIBot will fail to save reports until I fix it .. Dirk Beetstra T C 11:39, 16 February 2020 (UTC)[reply]
    Spamming of IP addresses obviously isn't much of a problem if it has taken 8 years to notice that this didn't take. - MrOllie (talk) 11:56, 16 February 2020 (UTC)[reply]

    MrOllie, It's not that easy .. there are some IP addresses on the list, which are likely the cases which jumped out. Others are less visible because it likely is limited to just a couple of additions per IP. Some searching on the ones that LiWa3 found suspicious:

    (COIBot is convinced that these do match the rule .. funny; I haven't analyzed whether these are 'a problem'). --Dirk Beetstra T C 12:12, 16 February 2020 (UTC)[reply]

    Next steps

    Lustiger seth, is User:Lustiger seth/sbl log 2013--2020 all entries with no hits, or does it require that the entry was in place before 2013? It looks like the former, the entries are added sequentially and the earliest one I can find is \bmuineresorts\.com\b - this was added in 2008 according to MediaWiki talk:Spam-blacklist/archives/June 2008 § Vietnam Travel Promotion Group. If so, I think we should go ahead and purge any with zero hits in your list - 1017 records, or about 12% of the list. Is there any way of checking whether that has a performance impact? Do we track server cost of blacklists? Guy (help!) 13:11, 19 February 2020 (UTC)[reply]

    hi Guy!
    as i said above, the list was not completed yet. however, the list should be complete now (since 2020-02-23 15:49).
    the first column contains all sbl entries that were on the sbl at the beginning of 2020-02. the second column contains the number of hits in the sbl log. the sbl log was created 2013-09, iirc. that means: 1. there is no log data prior to that date. 2. if an sbl entry is just 1 week old, this might be a reason for a low number of hits.
    performance: i don't know whether this can be measured (easily). -- seth (talk) 15:58, 23 February 2020 (UTC)[reply]

    Meta

    93 items have matching items on the global blacklist.

    Extended content
    • Regex requested to be blacklisted: \bxsl\.pt\b
    • Regex requested to be blacklisted: \badelaide-classifieds\.info\b
    • Regex requested to be blacklisted: \baliexpress\.com\b
    • Regex requested to be blacklisted: \bamericaswomenmagazine\.xyz\b
    • Regex requested to be blacklisted: \bcool-fuel\.co\.uk\b
    • Regex requested to be blacklisted: \bgeolocation\.ws\b
    • Regex requested to be blacklisted: \bgreattibettour\.com\b
    • Regex requested to be blacklisted: \bhappyjanamashtamiwishes\.blogspot\.com\b
    • Regex requested to be blacklisted: \blnk\.pics\b
    • Regex requested to be blacklisted: \bsportstation\.store\b
    • Regex requested to be blacklisted: \bstores\.ebay\.com\b
    • Regex requested to be blacklisted: \bhidemyass\.com\b
    • Regex requested to be blacklisted: \bsplit\.to\b
    • Regex requested to be blacklisted: \bempowernetwork\.com\b
    • Regex requested to be blacklisted: \bgrowtobacco\.net\b
    • Regex requested to be blacklisted: \bmeatspin\.com\b
    • Regex requested to be blacklisted: \bsukmulberryshops\.co\.uk\b
    • Regex requested to be blacklisted: \badfoc\.us\b
    • Regex requested to be blacklisted: \bgetinfo\.co\.in\b
    • Regex requested to be blacklisted: \boptimalstackproduct\.com\b
    • Regex requested to be blacklisted: \bmaletestosteronebooster\.org\b
    • Regex requested to be blacklisted: \bthehealthyadvise\.com\b
    • Regex requested to be blacklisted: \bteespring\.com\b
    • Regex requested to be blacklisted: \bfirstleaks\.com\b
    • Regex requested to be blacklisted: \bmuscleperfect\.com\b
    • Regex requested to be blacklisted: \bpharmshop-online\.com\b
    • Regex requested to be blacklisted: \blovifm\.com\b
    • Regex requested to be blacklisted: \bdankontorstole\.dk\b
    • Regex requested to be blacklisted: \bclonezone\.link\b
    • Regex requested to be blacklisted: \bgoods555\.com\b
    • Regex requested to be blacklisted: \bbikramsinghmajithia\.blog\.com\b
    • Regex requested to be blacklisted: \brebootmymodem\.net\b
    • Regex requested to be blacklisted: \b123malikoki\.info\b
    • Regex requested to be blacklisted: \bpisinaspa\.gr\b
    • Regex requested to be blacklisted: \bkickass\.ink\b
    • Regex requested to be blacklisted: \bpulseoxadvocacy\.com\b
    • Regex requested to be blacklisted: \bsport2018\.org\b
    • Regex requested to be blacklisted: \bmentaldaily\.com\b
    • Regex requested to be blacklisted: \bshort4free\.us\b
    • Regex requested to be blacklisted: \bpetstation\.store\b
    • Regex requested to be blacklisted: \bedubirdie\.com\b
    • Regex requested to be blacklisted: \batheistrepublic\.org\b
    • Regex requested to be blacklisted: \bwelookups\.com\b
    • Regex requested to be blacklisted: \bmywikibiz\.com\b
    • Regex requested to be blacklisted: \belbo\.in\b
    • Regex requested to be blacklisted: \beasy-bator\.com\b
    • Regex requested to be blacklisted: \ballxreport\.com\b
    • Regex requested to be blacklisted: \b1mg\.com\b
    • Regex requested to be blacklisted: \bsci-hub\.
    • Regex requested to be blacklisted: \bwhereisscihub\.now\.sh\b
    • Regex requested to be blacklisted: \bksol\.vn\b
    • Regex requested to be blacklisted: \byoucanplayandhavefun\.blogspot\.com\b
    • Regex requested to be blacklisted: \btournament-player-magazine\.blogspot\.com\b
    • Regex requested to be blacklisted: \blearn-how-to-play-this\.blogspot\.com\b
    • Regex requested to be blacklisted: \bletsmegetme\.blogspot\.com\b
    • Regex requested to be blacklisted: \bmedicines-for-allergies\.blogspot\.com\b
    • Regex requested to be blacklisted: \bnothingmoretodobefore\.blogspot\.com\b
    • Regex requested to be blacklisted: \bstarslots\.pw\b
    • Regex requested to be blacklisted: \btranscription-services-us\.com\b
    • Regex requested to be blacklisted: \bhoanganhmart\.com\b
    • Regex requested to be blacklisted: \bsuadieuhoagiare247\.com\b
    • Regex requested to be blacklisted: \bbladejournal\.com\b
    • Regex requested to be blacklisted: \bopknice\.com\b
    • Regex requested to be blacklisted: \bgame24h\.co\b
    • Regex requested to be blacklisted: \busagoldentour\.com\b
    • Regex requested to be blacklisted: \bozinice\.com\b
    • Regex requested to be blacklisted: \bsubweb\.co\.il\b
    • Regex requested to be blacklisted: \bdaynightcarebd\.com\b
    • Regex requested to be blacklisted: \bcamcavetxegiacao\.com\b
    • Regex requested to be blacklisted: \bgopaintsprayer\.com\b
    • Regex requested to be blacklisted: \bpro-pharmaceuticals\.com\b
    • Regex requested to be blacklisted: \bzom\.vn\b
    • Regex requested to be blacklisted: \buscagsa\.com\b
    • Regex requested to be blacklisted: \bsitusrajabola\.net\b
    • Regex requested to be blacklisted: \bagendominopro\.net\b
    • Regex requested to be blacklisted: \bforkeq\.com\b
    • Regex requested to be blacklisted: \bhempoilxll\.com\b
    • Regex requested to be blacklisted: \bgenericbuddy\.com\b
    • Regex requested to be blacklisted: \bmasterpkr\.com\b
    • Regex requested to be blacklisted: \btaruhanbandarq\.xyz\b
    • Regex requested to be blacklisted: \brevistas\.nics\.unicamp\.br\b
    • Regex requested to be blacklisted: \bmasters-of-fun\.de\b
    • Regex requested to be blacklisted: \bfreemansworld\.de\b
    • Regex requested to be blacklisted: \bonlinecasinounion\.us\.com\b
    • Regex requested to be blacklisted: \bschooltips\.com\.ng\b
    • Regex requested to be blacklisted: \bfreebitco\.in\b
    • Regex requested to be blacklisted: \blocuspharmaceuticals\.com\b
    • Regex requested to be blacklisted: \byoulike222\.com\b
    • Regex requested to be blacklisted: \bpharmacosmed\.com\b
    • Regex requested to be blacklisted: \bthrillophilia\.com\b
    • Regex requested to be blacklisted: \bsafe-steroids\.net\b
    • Regex requested to be blacklisted: \bbitmix\.biz\b

    I guess these alsoc an be cleaned up. Guy (help!) 15:54, 19 February 2020 (UTC)[reply]

    JzG, Glancing through this list, I guess they can all just go. Dirk Beetstra T C 06:01, 1 March 2020 (UTC)[reply]
    @JzG: minus Removed from MediaWiki:Spam-blacklist. --Dirk Beetstra T C 06:05, 1 March 2020 (UTC)[reply]

    Proposal

    I propose to do the following:

    1. Take the blacklist as of 1-Mar-2017 (3 years ago).
    2. Match any entry in seth's list with 0 hits since 2013.
    3. Remove all entries which have been on the SBL since 1 Mar 2017 or earlier with no hits since 2013.
    4. Remove all entries that are on the global blacklist.

    What do people think? Guy (help!) 22:02, 29 February 2020 (UTC)[reply]

    hi!
    i did similar things in former times. and i still think, this is useful. so i support this (and i could help getting it done). -- seth (talk) 22:20, 29 February 2020 (UTC)[reply]
    @Lustiger seth and JzG:, I have just removed the ones that do have an item on the global blacklist above (though, if the blacklisting reasons are significantly different it may be worth to keep them on here as well, but they can always be re-added if they become a new problem).
    Although it needs some care, most items that don't hit can indeed be safely removed. Here it is less of an issue than on meta, where you certainly do not want to remove redirect sites, malware sites etc. Dirk Beetstra T C 11:06, 2 March 2020 (UTC)[reply]
    ok! on friday or saturday i should have time to remove the ones, mentioned in 3. -- seth (talk) 22:28, 4 March 2020 (UTC)[reply]
    done.[3] the first script i used was faulty. this one should be correct:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use File::Slurp qw/slurp write_file/;
    
    my @sbl_present = slurp("present.txt");
    my @sbl_2017    = slurp("2017.txt");
    my @zero_hits   = slurp("zero_hits.txt", {"chomp" => 1});
    my (@sbl_new, @sbl_deleted);
    # reduce to actual entries (without any comments)
    @sbl_2017 = grep {s/^[^\s#]+\K.*//s; !/^#/} @sbl_2017;
    # filter present list
    for my $p(@sbl_present){
    	$p =~ s/[ \t]+$//;  # trim right
    	my $entry = $1 if $p =~ /^([^\s#]+)/; # get entry (without comments)
    	if(defined $entry 
    		&& grep($entry eq $_, @zero_hits)
    		&& grep($entry eq $_, @sbl_2017)
    	){
    		push @sbl_deleted, $entry . "\n";
    	}else{
    		push @sbl_new, $p;
    	}
    }
    write_file('new.txt', @sbl_new);
    write_file('deleted.txt', @sbl_deleted);
    
    i used the output in new.txt to update the sbl. the output in deleted.txt i'll use now to log the removals.
    -- seth (talk) 10:45, 6 March 2020 (UTC), 11:38, 6 March 2020 (UTC)[reply]
    Lustiger seth, I understand about 70% of that! Thanks :-) Guy (help!) 22:42, 6 March 2020 (UTC)[reply]

    COIBot suspicious local reports

    (crossposted to WT:WPSPAM)

    m:User:LiWa3 is doing basic statistics on domains that it has seen being added, and when those statistics are suspicious it throws those in the general direction of COIBot. COIBot is then reporting those in a local category or a xwiki category, depending on the type of statistics.

    COIBot has been saving reports for years, and most of those are still lingering (COIBot closes some automatically when they have been cleaned up independently, but with so many it does not check all). I evaluated a good handful of the old ones and closed them, and I have been trying to keep up with some of the new ones for a week. I do find that a significant portion of them do need a follow up (most need cleanup, quite some outright blacklisting).

    May I ask you to turn on category changes in your watchlist, watchlist Category:Open Local COIBot Reports, and evaluate all that COIBot is opening in there. Please try to close them with an evaluation remark for further reference. Thanks. --Dirk Beetstra T C 06:08, 8 March 2020 (UTC)[reply]