Jump to content

Wikipedia:Edit filter noticeboard

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Firefly (talk | contribs) at 07:10, 1 May 2022 (→‎Bad edit on filter 680?: Reply). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Welcome to the edit filter noticeboard
    Filter 614 — Pattern modified
    Last changed at 20:51, 24 September 2024 (UTC)

    Filter 189 — Pattern modified

    Last changed at 22:23, 21 September 2024 (UTC)

    This is the edit filter noticeboard, for coordination and discussion of edit filter use and management.

    If you wish to request an edit filter, please post at Wikipedia:Edit filter/Requested. If you would like to report a false positive, please post at Wikipedia:Edit filter/False positives.

    Private filters should not be discussed in detail here; please email an edit filter manager if you have specific concerns or questions about the content of hidden filters.



    We need to talk about filter 874

    • 874 (hist · log) ("LTA username / impersonation creations", private)

    I was looking through the log here, and frankly, it looks like the majority of hits are false positives. Now it's not possible to be sure; when you stop an account from being created, you don't get to find out what they were going to do. A few examples:

    • "Fabonz": I'm guessing this is supposed be an impersonation of Favonian? But I doubt it.
    • "Bl223907": is supposed to be Bbb23, maybe?
    • "Varajeezuz": zzuuzz, maybe?
    • "Enas Ahmed Eisa" Medeis, it would seem.

    And so on. Should we:

    • Go through this giant ball'o'cruft and carefully fix each of these of FPs?
    • Drop to tag-only, and rely on people reviewing Special:Log/newusers?
    • Throw some WP:TNT in the whole thing and start over?
    • Something else?

    I've never understood the value of disallowing LTA usernames. They are going to pick and another name and disrupt anyway. Except, unless you're a checkuser, you don't know the name they picked. If they stay with "[admin] is a wanker", their edits get reverted on sight. If their second attempt is "BoringUser37823" maybe their edits stick. Suffusion of Yellow (talk) 23:20, 22 March 2022 (UTC)[reply]

    Yes it is a problem, and has been for a long time. Unfortunately false positives are not usually very visible with this filter. I'd be inclined to work back through the false positives, as well as make some other common sense pruning. I do see some value in having a filter which prevents names which are abusive, and helps with denying recognition to some memes. Sometimes the username is a key part of the disruption. -- zzuuzz (talk) 00:44, 23 March 2022 (UTC)[reply]
    Thank goodness for https://regex101.com/debugger... so far I've managed to combine some patterns together, and I think we should probably split the filter out into more specific issues ~TNT (talk • she/her) 12:32, 25 March 2022 (UTC)[reply]
    Moving a couple of things into Special:AbuseFilter/1196 ~TNT (talk • she/her) 13:21, 25 March 2022 (UTC)[reply]
    Thank you! I'll keep an eye on both. Suffusion of Yellow (talk) 19:31, 25 March 2022 (UTC)[reply]
    Honestly I'm not entirely sure what's being attempted here. It looks like we're just forking the problem and I predict the same issues will persist, as seen in the first log entry of the new filter. It really needs a big old prune (along with more than a few more word boundaries). I'll take a scalpel to both filters in the near future. -- zzuuzz (talk) 19:45, 25 March 2022 (UTC)[reply]
    I've yet to find a time slot to dig though this, but I just wanted to add that I think we're also going to have to also talk about 102 (hist · log). -- zzuuzz (talk) 04:51, 31 March 2022 (UTC)[reply]
    • Someone just pointed out Special:AbuseLog/32282853 to me, which, arises from an issue in 874's 79th alternative. To allow discussion without leaking a private filter, I'll anonymize it as having the following format: abc.d?[e3] . Thus the issue arises from the confluence of the wildcard, the question-mark quantifier, and the lack of a boundary assertion at the end.. To me, that's far too much flexibility to be having in a filter that leaves no obvious avenue for appeal. Since we've been talking about this for a bit, I'd like to make a proposal: All patterns in 874, 102, or any other account-creation filter:
      1. Must not, when excluding any characters that are quantified at minimum length 0, consist only of 3 or fewer literal ASCII characters (like the pattern in 102 that could be reduced to ryr)
      2. If they specify only 4 to 5 literal ASCII characters (like the 79th alternative in 874), must not contain any wildcards mid-pattern. At a minimum, they must use \S or \w, but preferably something narrower than those. Depending on what the characters are, this may be advisable even at higher character counts (as in the Medeis example).
      3. Must not contain any mid-pattern quantifiers of wildcards or large character classes with large or infinite maximum lengths (e.g. zz[a-z]*uu[a-z]*zz, unless both ends are very very narrowly tailored (e.g. Suffusion.*Yellow).
      4. If any string they match could plausibly occur in any context other than abuse, must start with a ^ or \b and end with a $ or \b. (So a direct match on a username that isn't a word outside of Wikipedia (e.g. zzuuzz) doesn't need such an assertion, nor does one with some basic substitution or repeating-character quantifiers fit in (e.g. z[sz]+[uv]+z[sz]+), but something that could arise in a normal context (e.g. zuz, although that also breaks rule 1) needs those boundary assertions.)
    • Exceptions could be made in emergencies or by concurrence of two EFMs, to be noted in the filter commments, with the understanding that they will monitor for FPs. Thoughts? -- Tamzin[cetacean needed] (she/they) 21:15, 1 April 2022 (UTC)[reply]
    • @Tamzin: Sound like good ideas, but those giant crufty regexes just give me a migraine. So, I boldy switched 874 to /x ("ignore whitespace") mode. We haven't done that with a filter before AFAIK, but maybe we should start. If no one reverts that change I'll start pruning. Suffusion of Yellow (talk) 00:16, 2 April 2022 (UTC)[reply]
    • @Zzuuzz, TheresNoTime, and Tamzin: Just did a major tidy of 874 and 1196. In the end, I downloaded the entire filter log (about 22000 hits) log so I could grep locally instead of waiting for abusefiltercheckmatch. There were some patterns in there that had caused thousands of false positives. Oh and (?x) works wonders.Suffusion of Yellow (talk) 22:48, 4 April 2022 (UTC)[reply]
      Nice bit of tidying, thanks. -- zzuuzz (talk) 10:09, 5 April 2022 (UTC)[reply]
      And did a similar de-cruft of 102. "Only" looked at the last five years of the log. If I removed someone's "favorite" string, please restore, but consider providing some context in the notes <grumble grumble>... Suffusion of Yellow (talk) 02:00, 8 April 2022 (UTC)[reply]

    New proposal

    There are two problems with the account creation filters (102, 874, 1196): (1) The standard disallow message that we're using doesn't make any sense and is BITEy. (2) No matter what message we use, disallowing automatic account creations is always BITEy. What are they supposed to do, create two accounts? Expose their IP on EF/FP/R? Ping an enwiki admin from meta? So let's:

    (1) Set filter 102, 874 and 1196 to match on manual account creation only, and use a message like this:


    After all, no one should expect their first (or second or third) choice on any popular website. All the cool names are taken. Just saying "pick something else" isn't a big deal IMO. But talking about "abuse" and "disruption" and "blocking" and such; let's not do that. Note that the I didn't link to WP:EF/FP/R intentionally; if the account doesn't exist yet, they have to expose their IP to make the report.

    (2) Create a new filter (we'll call it "Persistent LTA usernames" or something) matching on manual and automatic creation for use only against LTAs who evade the other filters. Everything added to this filter must:

    • Include the date that it was added
    • Include some context as to why it was added (log id of account creation, SPI page, whatever)

    Anything will no true positives in a year, or anything added without a date or explanation will be removed.

    I have no idea what kind of message to use for this new filter. Most autocreations are probably from non-native English speakers anyway. But admins should try to monitor the log, and force local creation for anything that looks like a FP. Since it's only a last resort, the log should ideally be pretty sparse. Suffusion of Yellow (talk) 18:38, 8 April 2022 (UTC)[reply]

    This seems like a reasonable proposal. We seem to have a lot of username filters, many with their own messages. -- zzuuzz (talk) 22:32, 14 April 2022 (UTC)[reply]
    zzuuzz: Heh, already got started per WP:SILENCE. I'll put in an edit request for the message above, and add it to 102, 874, and 1196. I'm still open to suggestion for a message for 1198 (hist · log) (once something is added to it). What, again, is someone supposed to do when an autocreation is disallowed? Not that they'll notice if they don't try to log in at enwiki, but if they do, what then? Should we set up page at meta for FP reports, and remember to watch it? Ugh. Suffusion of Yellow (talk) 22:53, 14 April 2022 (UTC)[reply]
    I figured you probably would, but it's sometimes nice to get any feedback. IMO, we should simply be creating these accounts (plus fixing the filter). In other words, we already have the list of reports to watch. How to expedite the reports is another matter. Maybe the (non-ideal) solution would be a page with instructions on who to ping from meta. Another option would be ACC (or VRT). -- zzuuzz (talk) 23:07, 14 April 2022 (UTC)[reply]
    And  Done. Updated 102, 874, 1196 with the new message. @Zzuuzz: How "dangerous" is the centralauth-createlocal right, anyway? I'm having trouble thinking of a way to abuse it. Maybe it should be given to more user groups. Suffusion of Yellow (talk) 20:47, 17 April 2022 (UTC)[reply]
    The biggest issue I can see is not understanding why the account cannot be self-created. This might be due to a username issue (I guess always a filter?), or a local ACB block. If you don't know an IP address, or thus the reason account creation is blocked, then there's a risk in that. Granted it's not always a huge risk. With ACC, overriding an ACB block is often sent to the CU queue. IMO this system slightly leans towards the more paranoid side of things, but it is based on actually knowing the IP. I'm not going to suggest it should be a CU-only right - admins are entrusted to grant IPBE and lift blocks (and indeed create local accounts) based on what users tell them, but it also requires a certain amount of nous. Interested to hear any suggestions... -- zzuuzz (talk) 21:34, 17 April 2022 (UTC)[reply]

    Regarding filter 958

    958 (hist · log)

    Hello, edit filter helpers/managers!

    I was reviewing filter 958, but I noticed that you didn't list all the IPs of the U.S. Congress under WP:SIP. To provide better guidance, I'd like you to add 137.18.0.0/16, 12.185.56.0/29, 12.147.170.144/28, 74.119.128.0/22, 2620:0:E20::/46, 2620:0:8A0::/48, and 2600:803:618::/48, in case any of them edit.

    If you cannot add these to the filter, then that is fine. — 3PPYB6TALKCONTRIBS17:07, 28 March 2022 (UTC)[reply]

    @Legoktm—Courtesy ping as you created this filter. — 3PPYB6TALKCONTRIBS17:59, 28 March 2022 (UTC)[reply]
    This filter is run for all anonymous edits, and it checks each edit against each IP range. For the sake of efficiency it makes sense for the filter to only use ranges which are actively used. The above are not. BTW I'm not really sure of this filter's general utility, since you could just check the contribs. Does anyone think it's useful? -- zzuuzz (talk) 18:55, 28 March 2022 (UTC)[reply]
    This is one of those cases where "conditions" are a really poor measure of performance. An IP range check literally requires converting a few numbers from base 10, then checking against a bitmask. Probably takes a few hundred CPU cycles. But, yes, each check burns through one of our precious 1000 conditions, so I'd rather not do this unless either (A) the filter is converted to use regex, or (B), AbuseFilter is patched so that ip_in_range() takes multiple arguments.
    And no, I don't see the point of the filter either. Suffusion of Yellow (talk) 20:01, 28 March 2022 (UTC)[reply]
    @Suffusion of Yellow—In that case, I'll just say this: on second thought, merely checking WP:SIP should be enough for you to tell any congressional staffer's edits. If you feel that the filter is no longer required, feel free to delete it. — 3PPYB6TALKCONTRIBS21:36, 28 March 2022 (UTC)[reply]
    Don't particularly see the point of the filter either. I remember browsing its hits out of general interest before, to see what Congressional IPs are editing, but not sure of the maintenance purpose here (and even if there were one, what makes US Congress edits distinct to various other countries' parliamentary IPs, which aren't filter-logged?) ProcrastinatingReader (talk) 21:43, 3 April 2022 (UTC)[reply]
     Done Disabled the filter. Suffusion of Yellow (talk) 02:04, 8 April 2022 (UTC)[reply]
    @ProcrastinatingReader, @Suffusion of Yellow: the main point of the filter is for tagging functionality, and I hope United States congressional staff edits to Wikipedia and Wikipedia:Congressional staffer edits explain the utility.
    If performance is an issue, I guess I could just write a bot to add the tags instead of relying on AbuseFilter? As for why only US Congress...just my US bias as an American, happy to set it up for other countries/governments if we know their IP ranges. Legoktm (talk) 16:34, 13 April 2022 (UTC)[reply]
    It's not that it has no utility; I just don't see much utility beyond Special:Contribs.
    On the subject of performance, every now and then we get hit with the need for an "emergency" filter. That's not the time to fuss over the condition limit. So I like to keep a buffer of a few hundred conditions when things are "slow", and this looks like low-hanging fruit.
    That said, I really doubt this filter actually slows anyone down significantly; as I said IP range checks are a trivial calculation. Suffusion of Yellow (talk) 18:09, 13 April 2022 (UTC)[reply]
    Sorry if I'm a bit late to the party here but I was just looking through the filters and it seems filter 1025 does something similar (just without the tagging), would it also be worth disabling this? FozzieHey (talk) 16:57, 29 April 2022 (UTC)[reply]

    Edit filter #53 - issue with false positive hits (resolved)

    Hi everyone! I just wanted to leave a quick note that an update was added to edit filter #53 at 01:29, 14 April 2022. This update accidentally introduced an unintentional pipe ("|") OR value in one of the parameters added to the diff and edit summary strings. This resulted in a very large amount of false positive hits being logged. After noticing this huge influx of hits and after looking through some of the logs, I knew there was a problem and temporarily disabled the filter at 01:38, 14 April 2022 until I could identify and resolve the issue. After some debugging, I identified, located, and resolved the issue shortly afterwards, and the filter is back online and working properly.

    I'm leaving this notice in the event that any reports are filed regarding a false positive and as a result of the issue. If the edit filter log recorded a trigger by edit filter #53 and between 01:29, 14 April 2022 - 01:38, 14 April 2022 (9 minutes), you can safely attribute it as being a false positive hit due to this issue, and indicate in the report that it's already been resolved. Thanks everyone, and I apologize in advance for any reports that are filed that involve what happened. If anyone has any questions or concerns, please let me know by messaging me on my user talk page. :-) Cheers - ~Oshwah~(talk) (contribs) 01:59, 14 April 2022 (UTC)[reply]

    New administrator activity requirement

    The administrator policy has been updated with new activity requirements following a successful Request for Comment.

    Beginning January 1, 2023, administrators who meet one or both of the following criteria may be desysopped for inactivity if they have:

    1. Made neither edits nor administrative actions for at least a 12-month period OR
    2. Made fewer than 100 edits over a 60-month period

    Administrators at risk for being desysopped under these criteria will continue to be notified ahead of time. Thank you for your continued work.

    22:52, 15 April 2022 (UTC)

    Heh. User talk:Edit filter → here. -- Tamzin[cetacean needed] (she/they) 22:58, 15 April 2022 (UTC)[reply]
    The edit filter system account is flunking both sets of activity requirements (no edits, last logged action 2020) and should be desysopped. (Although this comment was written in a jocular tone, I do genuinely believe that a bureaucrat should remove the account's admin rights) * Pppery * it has begun... 00:07, 16 April 2022 (UTC)[reply]
    Looks like the account was desysop'd once before, in 2019. Log entry. Old EFN thread. Phab ticket.Novem Linguae (talk) 00:31, 16 April 2022 (UTC)[reply]
    Don't think it's possible to desysop(?), since [1] will just resysop it. And the phab ticket, or more specifically the SRP discussions, seem like bikeshedding; it's a piece of code, not a human. ProcrastinatingReader (talk) 01:55, 16 April 2022 (UTC)[reply]

    Is this the place to report a false negative? (filter 320)

    So, I recently reverted the following revision/vandalism: 1085380463
    In it, the supposed phrase that should trigger filter 320 appeared three separate times, but the filter didn't trigger at all. Sure other filters triggered, but 320 which is specifically for this, didn't.
    Have I just misunderstood what the filter is for, or is this really a false negative?

    Also please correct me if this is the wrong place to report possible false negatives. – 2804:F14:C060:8A01:E11F:9193:92A6:441B (talk) 07:34, 30 April 2022 (UTC)[reply]

    Well this is awkward... I just realized that the filter only detects changes that altered less than 200 characters (if I read the top right)... so this isn't really a false negative, it's just how the filter works.
    Well I won't delete this in case someone answers my question of "where to report false negatives", but I wouldn't have reported it if I had understood that part of the code. – 2804:F14:C060:8A01:E11F:9193:92A6:441B (talk) 08:25, 30 April 2022 (UTC)[reply]
    You can report FN at the FP board, WP:EF/FP. We don't get a lot of FN reports except from admins who are working on fitlers so there isn't a main board for them. In general we are far more accepting of FN's then FP's. — xaosflux Talk 10:16, 30 April 2022 (UTC)[reply]

    Bad edit on filter 680?

    Hi there, please revert this change to filter 680 https://en.wikipedia.org/wiki/Special:AbuseFilter/history/680/diff/prev/cur or try changing the "U" to lowercase "u"; it looks like this making many false positives; see WP:EFFP. Notified Oshwah at EFFP but got no immediate response. 🐶 EpicPupper (he/him | talk) 06:48, 1 May 2022 (UTC)[reply]

    @EpicPupper fixed. Lowercase 'x' needed in that Unicode code point. Pretty sure uppercase \X matches any Unicode code point 😬 firefly ( t · c ) 07:10, 1 May 2022 (UTC)[reply]