Jump to content

Wikipedia talk:Edit filter

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Prom3th3an (talk | contribs) at 02:58, 23 April 2009 (*Do not* edit my posts.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Diff?

Unresolved
 – Perhaps a bugzilla could get us diff links? –xeno (talk) 16:46, 6 April 2009 (UTC)[reply]

Could "diff" buttons be added to the AbuseLog to make it easy to revert changes? I understand that at least some of the filters should automate reversion at some point, but for now (and for the filters which don't trigger auto-reversion), it would be a great vandal-fighting tool. –Drilnoth (TC) 03:23, 19 March 2009 (UTC)[reply]

That would be helpful, the only problem is not every log entry results in a diff. For those that did though, it would be great. –xeno (talk) 03:25, 19 March 2009 (UTC)[reply]
Which ones don't create a diff? (sorry; I haven't looked at all of the filters yet). –Drilnoth (TC) 03:36, 19 March 2009 (UTC)[reply]
i.e. a log entry that resulted in a warning and the user didn't save the edit. –xeno (talk) 03:38, 19 March 2009 (UTC)[reply]
Ah, okay. Thanks for the clarification. –Drilnoth (TC) 03:41, 19 March 2009 (UTC)[reply]

A diff view is visible if you click 'details'. Prodego talk 03:49, 19 March 2009 (UTC)[reply]

The problem is that that diff has information about the changes, but its not a normal diff so Twinkle can't rollback the edit, and there isn't even an easily accessible "undo" button. –Drilnoth (TC) 13:26, 19 March 2009 (UTC)[reply]
Seconded. Please could (diff) be a link to the html diff? This would facilitate fixing. Thx. --  Chzz  ►  17:33, 19 March 2009 (UTC)[reply]
Thirded. I'd love that. ViperSnake151 17:35, 19 March 2009 (UTC)[reply]
The problem is that on most of them, no edit took place, and there is nothing to roll back. Prodego talk 17:36, 19 March 2009 (UTC)[reply]
Yep, I mentioned that above, but for the ones that did result in an edit, it would be great to just have the diff. It would probably also reduce load from people clicking on the "details" or "examine" button... as those take a minute or two to come up. –xeno (talk) 17:38, 19 March 2009 (UTC)[reply]
  • See below on the filter 3 discussion. We need to be able to distinguish between edits that were attempted and ones that can and/or need to be undone anyway. A diff link would help. - Mgm|(talk) 00:29, 20 March 2009 (UTC)[reply]

Where was the community vote on creating an additional privilege above "admin"?

I don't recall anything like that ever showing up in a watchlist notice. And I don't see the RfAFE's for these people either. --Random832 (contribs) 12:54, 23 March 2009 (UTC)[reply]

Did you not get the memo?? Happymelon 12:58, 23 March 2009 (UTC)[reply]
New software features come about all the time. The developers don't sit around seeking consensus for new features. We develop policy as it is needed. Chillum 13:00, 23 March 2009 (UTC)[reply]
A feature is one thing, the implementation of this as a separate user group vs including within sysop is something that the devs tend to follow individual project consensus on - so where was the consensus for this? And how did these people in particular get this flag? If there is no process, then there will be no objection to me going down this list and giving it to everyone on it, right? --Random832 (contribs) 13:05, 23 March 2009 (UTC)[reply]
Sysops can grant themselves the AFE right if they want to muck around with the AF. There's no additional red tape. AFE is not "above sysop", it's just granted to (or taken by) those who need it to edit or view private filters. Giving it out to every sysop would be pointy imo, and a waste of time. There is discussion above somewhere about rolling the right into the sysop package and another one about whether it is a good idea to consider granting it to non-sysops. –xeno (talk) 13:09, 23 March 2009 (UTC)[reply]
How would it be pointy? Given that every current AFE is also an admin; and that there seems to be no objection to admins granting themselves this flag without any discussion, it seems that the reality is that abusefilter-modify is an admin permission. If any admin is entitled to have the ability to modify filters, then there is no advantage whatsoever for the permissions not to be included in 'sysop'. It's not like an admin can 'accidentally' modify a filter if they don't intend to work in that area. Happymelon 14:05, 23 March 2009 (UTC)[reply]
Spamming the userrights log to make a point? (When I said "giving it out to every sysop", I was referring to Random's suggestion that he might go down the list of sysops and grant every one of them AFE) –xeno (talk) 14:08, 23 March 2009 (UTC)[reply]
Oh, sorry, I misread you. Yes, manually granting the 'abusefilter' flag to every current admin would be pointy. I thought you were talking about bundling the same rights into 'sysop'. Do you have an opinion on that? Happymelon 14:11, 23 March 2009 (UTC)[reply]
Yea, I don't really see why not, other than it would make it harder to get a list of "sysops-who-muck-about-with-the-AF". –xeno (talk) 14:13, 23 March 2009 (UTC)[reply]
Modifications to filters are supposed to be logged in Special:Log; that code is disabled for the time being because one of MediaWiki's standard database tables needs to be changed slightly (a column expanded to allow it to take longer entries) to accomodate the data. Then we'll be able to search all filter changes in the normal way. Happymelon 14:37, 23 March 2009 (UTC)[reply]
I still see some value in having an actual list, rather than forcing someone to trawl through a log to find an AFE-active admin. –xeno (talk) 14:39, 23 March 2009 (UTC)[reply]
The list might be current now, but there's no reason why it should remain so; people are just as likely to add that bit, do a bit of work, then drift off, as they are in any other job. If you want to know who currently does renames, do you look at Special:ListUsers/bureaucrat?? People shouldn't have to look at logs or lists to find AFE-competent people; they ask in the appropriate forum and interested admins watch it. Happymelon 16:25, 23 March 2009 (UTC)[reply]
It is not about sysop vs not-sysop! Good lord no! It is about having the technical knowledge and sense to use such a powerful tool. We don't give it to every admin because many admins would not have a clue how to use it. If we do grant it to non-admins it should be on the basis of technical skill and trust. It is not above or below admin. This is not a 1 dimensional concept and it cannot be reduced to that. Chillum 14:26, 23 March 2009 (UTC)[reply]
A similar amount of technical knowledge is required to edit the Spam blacklist, complicated templates, to perform rangeblocks or to modify the site JavaScript. All of these facilities are available to all administrators by default. Yet the site has not collapsed under the weight of admins who "do not have a clue how to use [them]" randomly playing around with features they do not understand, because we have selected the admin community to be, in general, comprised of users who do not do such things. It is not possible to 'accidentally' modify these things, so if you make the good-faith assumption that admins will not mess with things they do not understand until they have gained that necessary understanding, there is no reason why the ability to modify filters should be three clicks away instead of two. Any admin can grant themselves the AFE flag and go fiddling, if they are comfortable that they know what they are doing. Any admin can go edit the spam blacklist, if they are comfortable that they know what they are doing. Where is the additional 'safety' in the former situation? Happymelon 14:43, 23 March 2009 (UTC)[reply]
I am not opposed to admins granting the right to themselves. I think that they can decide if they are capable of using a tricky tool. The primary advantage of having it as a separate user group is that we can grant it to capable and trusted users who are not admins. Chillum 15:20, 23 March 2009 (UTC)[reply]
Indeed, and that is a separate issue. What's being suggested is that the same permissions that are given by the AFE flag are also included in the 'admin' bundle; in exactly the same fashion as rollback or IPBE. We can discuss the issue of whether or not to give AFE out to non-admins separately, but at the moment every one of the AFE flags are held by users who are also admins; there's complete duplication. Admins are indeed capable of deciding if they are capable of using this tool; they don't need to be put through a nuclear warhead-style interlock if they decide they are comfortable using it. Happymelon 16:30, 23 March 2009 (UTC)[reply]
I think an admin can decide if they are capable of using it before assigning to themselves. I would hardly call it a "nuclear warhead-style interlock", it is the same procedure used to add or remove any user right. Automatically granting it as part of the admin package seems out of place. It should not be granted to all admins because a) not all admins should be using it b) it would give the appearance to others that simply being an admin qualifies you. Think of it as a little plastic lid covering a dangerous button that can be flipped up if it is really needed. Chillum 17:55, 23 March 2009 (UTC)[reply]
The "little plastic lid" idea is the analogy I was really going for, except that that implies that there is nothing else to stop admins destroying the wiki. To screw up a filter from an empty browser, the admin needs to log in, type "Special:AbuseFilter" into the search bar, then scroll down, find a likely filter, click on it, change some settings, then click the "save" button. As opposed to logging in, typing "Special:UserRights/Jimbo", checking the AFE box, clicking save, then continuing the process? Do you really think that there is a danger of admins completing the first chain whilst sleepwalking, while the second will defeat them? Indeed admins can decide if they are capable of using it. Having decided, they don't need to go through extra sanity checks, that's the nuclear missile analogy. If an admin thinks they are capable of editing the filters, they will edit the filters. What's the benefit of the extra paper trail?
Your arguments against are, IMO, based on the continued misassumption that no one who has the technical permission is capable of resisting the temptation to use it. "simply being an admin qualifies you"... to do what? Being an admin qualifies you to have the technical ability to edit the filters; that's as true now as it would be if the permissions were bundled. Being qualified to actually implement and maintain filters is a status restricted neither to admins nor to AFE holders; but that's inevitable. Happymelon 19:10, 23 March 2009 (UTC)[reply]

We seem to be going round and round on something that in my personal opinion doesn't matter. The difference between having a right by default and having the right to add the right is very small. Could someone organize a vote or something so we can try to get some closure on this? Dragons flight (talk) 19:00, 23 March 2009 (UTC)[reply]

I agree that it's small; I wouldn't say it's irrelevant. I also agree that further cyclic discussion is unlikely to be constructive. I'll get a little staw poll going. Happymelon 19:11, 23 March 2009 (UTC)[reply]

Straw poll

This will have the effect of giving all administrators the access to the Abuse Filter settings that is currently restricted to the 'abusefilter' group. This issue is separate from the question of whether the 'abusefilter' group should continue to exist and to whom it should be granted, which should be discussed separately. Please indicate support or opposition below. Happymelon

Pointless poll. Admins can already assign the right to anybody, including themselves, and the right is basically a technical right, people without the technical knowledge have no use for it, and those with it can get it easily and without problems. This right seems to be 100% noncontroversial, and we shouldn't run around demanding exhaustive community polls every time a new feature is enacted. The abusefilter feature has already been approved by the community, so this side trip down Pointless Poll Lane serves little purpose except to derail the implementation of an otherwise useful feature. Lets just let this be and move on. If an admin needs it, let them take it. If a non-admin needs it, let them get it from an admin. The number of people who are needed to maintain the abuse filter is diminishingly small, so there does not seem to be the need to enact this automatically for 1700 people, nor does there seem to be any need to demand a convoluted "approval" process to get this bit. This is a solution in search of a problem, and the entire thing is pointless. --Jayron32.talk.contribs 22:11, 23 March 2009 (UTC)[reply]
I'm not sure I understand much of what you're saying here. How on earth do you think that this poll is somehow intended to "derail" the AbuseFilter? If a non-admin needs it, let them indeed get it from an admin; although I suggested that we consider this issue separately, I'm generally in favour of retaining the separate AFE right to give to non-admins if desirable. But if an admin needs it, why the pointless paperwork of granting themselves an extra bell and whistle before getting to work? The situation is analogous to admins still being able to grant and revoke rollback, but needing to grant it to themselves before being able to use it. Happymelon 22:31, 23 March 2009 (UTC)[reply]
The status quo seems perfectly fine to me. Admins can give themselves the flag if they want to, and having this as a separate flag opens the possibility to give it to non-admins as well, should there be a consensus to do so in the future. --Conti| 22:14, 23 March 2009 (UTC)[reply]
I agree, pointless poll. The setup seems to be working just fine at present. We can revisit this question of there are any serious problems, but for now let's just see how it goes. Tim Vickers (talk) 22:15, 23 March 2009 (UTC)[reply]
Well we both know there aren't going to be any problems that would be somehow magically 'fixed' by this change. Not all changes have to fix problems. Just because it's not broken doesn't mean it can't be improved. Happymelon 22:34, 23 March 2009 (UTC)[reply]
  • I support its being included in the sysop package, as Administrators are already trusted with tasks of equivalent sensitivity (e.g. Spam blacklists) and hence should be trusted with this. The inclusion of this right would just save Admins a little bit of hassle, in my opinion; its presence does not oblige Sysops to make use of that right every day. To give a similar example, despite technically being able to edit others' .CSS and .JS pages (editusercssjs), upload a file from a URL address (upload_by_url) and mark rolled-back edits as bot edits (markbotedits), I have yet to actually use any of those aforementioned rights. This does not mean, however, that they should be removed from me, as they might come in useful one day. The fact of the matter is, Sysops are trusted not to use their rights with malicious intent, and, as such, the situation whereby they can make use of these technical features should remain. It Is Me Here t / c 22:18, 23 March 2009 (UTC)[reply]
  • It's completely pointless. Admins can give themselves the ability or get it from a friendly fellow-admin. There's no need to give someone a right they're unlikely to need. This is totally different from ipexempt and rollback which many admins use frequently in day to day editing; the latter more than the first. This is an ability you only need when you plan on editing this. Keeping it separate allows us to give the right to non-admins (similar to the botflag) without giving them full admin rights they likely won't need. - Mgm|(talk) 22:43, 23 March 2009 (UTC)[reply]
    Certainly we should keep the separate AFE flag in the same fashion as IPBE or accountcreator. But how is this permission different to those? How is it different to something like the spam blacklist? If the coin had come down the other way and the devs had added the permission to the sysop bundle to start with, would we be having this discussion in the other direction? I agree it's a minor issue, but I wouldn't say it's pointless. Happymelon 22:52, 23 March 2009 (UTC)[reply]
We probably might have. IPexempt, rollback and accountcreate don't require particular know-how to use. That's why I don't think this can properly be compared to those permissions. - Mgm|(talk) 00:17, 24 March 2009 (UTC)[reply]
  • Oppose It should be a separate flag. Flagging every admin is like saying every admin is able to understand how the filter works. I trust admins to decide if they are capable of using it safely then giving themselves the right. This extra step is not hard. Chillum 23:20, 23 March 2009 (UTC)[reply]
  • Meh. Who cares? --Carnildo (talk) 00:46, 24 March 2009 (UTC)[reply]
  • Oppose, with a dash of meh If someone want to edit the filter, they can op themselves. I would actually prefer that mediawiki and template namespace editing by disallowed in the same fashion for admins, but that's another issue. This is a simple way to keep track of who can edit the filter and a potential speed bump before editors who attempt to edit the filter but really shouldn't be (like me). Protonk (talk) 03:06, 24 March 2009 (UTC)[reply]
  • Support. I see no reason not to bundle it into the sysop package. Ruslik (talk) 04:38, 24 March 2009 (UTC)[reply]
  • Support Requiring admins add themselves to a group is a complete waste of time. BJTalk 10:37, 24 March 2009 (UTC)[reply]
  • "Why bother" - As I stated above, I still see value in having the list of AFE flagged people easily accessible. –xeno (talk) 12:55, 24 March 2009 (UTC)[reply]
    If it were certain to be an accurate list, I would agree with you. Unfortunately experience with other user groups shows that it will not remain so. Happymelon 13:34, 24 March 2009 (UTC)[reply]
  • Whats the point? Any admin does have it, they just need to click an extra switch. rootology (C)(T) 13:10, 24 March 2009 (UTC)[reply]
    Equally, why bother having to add an extra bell? What's the "point" in making them jump through an extra hoop? Happymelon 13:34, 24 March 2009 (UTC)[reply]
    Exactly. It simply doesn't matter, so why bother even making a fuss over it, and the specific way it was implemented? It's like complaining over a handful more onions than garlic in your meal, or a handful more garlic than onions. A week later, it won't matter since it has zero impact on anything of importance. rootology (C)(T) 13:52, 24 March 2009 (UTC)[reply]
    Yes, but in the honeymoon phase of AFE I think it'll be helpful. No need to rush into rolling it up. Jmho. –xeno (talk) 13:58, 24 March 2009 (UTC)[reply]
    Well said. We should at the very least wait until the use of the tool stabilizes before giving it out to people who have not even expressed a desire for it. Chillum 14:00, 24 March 2009 (UTC)[reply]
  • "Pointless question" - Admins who need it can get it, just an extra couple of clicks. If they don't need it, or don't want to use it, then why bother giving it out. Are you also going to hand out AccountCreator, Huggle, VP and popups with the admin bit? Why even make a point about it. --Dirk Beetstra T C 13:56, 24 March 2009 (UTC)[reply]
    Exactly. Everyone knows I'm a Process Fan, since Good Process That Is Enforced Can Give Sunlight To Shenanigans, but this is like raising a fuss over garbagemen smoking a cigarette while they collect garbage instead of during "scheduled smoke breaks" for level of crisis. rootology (C)(T) 14:05, 24 March 2009 (UTC)[reply]
    Account creator and Huggle (through rollback) have already been given to all admins. Popups is not an admin tool. Ruslik (talk) 14:53, 24 March 2009 (UTC)[reply]
    Similarly for AWB, IPBE, and others. Happymelon 15:08, 24 March 2009 (UTC)[reply]
    Funny, last time I checked, I was not an Account creator, however, I can turn it on. And rollback != Huggle. You're right with popups, that is a free choice to use it, similar to AWB, IPBE, others .. and now (for admins), AbuseFilter. --Dirk Beetstra T C 19:14, 24 March 2009 (UTC)[reply]
  • It may be of interest that there are already 105 people with the AFE right; all of them are administrators (this is 6% of all administrators, a higher percentage of all active ones); but only 45 have ever modified a filter. Interpret as you will. Happymelon 14:14, 24 March 2009 (UTC)[reply]
  • Many are probably like me, that wanted to start out just reviewing filters. I'll probably do more later sometime, as I'm fairly familiar with regex (I should be, after years of use!). rootology (C)(T) 14:18, 24 March 2009 (UTC)[reply]
  • I have the flag because I intend to use it and have confidence in my ability to do so. I intend to start slow after I have had time to read the instructions, so I have not yet used it. Chillum 14:27, 24 March 2009 (UTC)[reply]
  • Support Current implementation makes little sense. — Jake Wartenberg 18:56, 24 March 2009 (UTC)[reply]
  • Oppose bundling it with the admin right. It makes very little real difference, the only advantage with the current situation being that one can see which admins have given themselves the permission = have a rough idea of who uses it. Not a great advantage, I know, but decent enough, I think. ╟─TreasuryTagcontribs─╢ 19:16, 24 March 2009 (UTC)[reply]
  • Support Admins can get it anyway, just by clicking a few buttons. The group should be used for users who are not administrators. Techman224Talk 19:14, 28 March 2009 (UTC)[reply]
  • Support, userright is cluttered enough as it is. The capability argument is not really relevant, since an incapable admin can give himself that right already. I'd support a separate right if it was given on a case by case basis by bureaucrats. -- lucasbfr talk 09:36, 29 March 2009 (UTC)[reply]
  • Support – makes sense to me; see #Naive question. —Anonymous DissidentTalk 11:39, 1 April 2009 (UTC)[reply]
  • Oppose – ... multiple reasons:
    1. Abusefilter can cause temporary chaos (reversible, but the proverbial damage is done) if a filter is improperly crafted. This can range from slowdowns to accidentally preventing legitimate edits from going through to automatically removing autoconfirmed status on innocent users. Maybe it's just me, but I like the idea of encouraging an admin to sorta "figure out" abusefilter before adding himself to the group.
    2. "Be bold" does not, in my opinion, apply to abusefilter in the same way it does to editing. I think extreme caution needs to be taken on every abusefilter edit, and each of those edits should be treated as if you're walking on eggshells. There's something reassuring about asking an admin to add himself to the group as an informal contract to be careful— that is, "this is dangerous stuff. add yourself to the group with caution."
    3. It would possibly be easier to track who leak the contents of private filters to the public if the only admins who could leak those filters have to be in a specific usergroup (smaller pool size). From there it's logical deduction, so if you're not going to be editing abuse filters, an admin can stay out of the group unless otherwise needed.
    4. It also allows us to track bizarre, sudden self-group-adds to the abusefilter group on dormant admin accounts, which would allow us to more closely watch for compromised accounts before they can do any real damage should they then screw with the abusefilters.
      --slakrtalk / 01:48, 20 April 2009 (UTC)[reply]

Criteria for a Private Filter

Its come to my attention that wether a filter is private or not is largely down to the discression of the administrator who makes it and there are no guidelines as to what should be private and more importantly, what should not. Im of the opinion that all filters should be public unless an elaborate regex rule that could be easily circumvented if the regex was public (IE a meme pattern).

An example of a filter that should not be private is [1] and because it is an "as is" filter that cannot be circumvented. Another questionable regex is [2]   «l| Ψrometheăn ™|l»  (talk) 02:00, 27 March 2009 (UTC)[reply]

These filters both contain information that would assist an abusive user in circumventing them. –xeno (talk) 02:06, 27 March 2009 (UTC)[reply]
How can you circumvent a move throttle? seriously? and more to the point I bet any user could tell you what that filter contains either in full regex oy laymans terms, including the conditions.   «l| Ψrometheăn ™|l»  (talk) 02:09, 27 March 2009 (UTC)[reply]
Erm, it involves some beans?xeno (talk) 02:11, 27 March 2009 (UTC)[reply]
Well because I don't know what the filter contains, one can but wonder how a simple move page vandalism filter need to be private for it to work and back to the inital statement, what constitutes a private filter?   «l| Ψrometheăn ™|l»  (talk) 02:14, 27 March 2009 (UTC)[reply]
He can't move pages until he is autoconfirmed by waiting 4 days and making 10 edits, and yet he routinely waits just long enough and makes just enough edits to do that. If he knew the specific requirements of those rules they would be just as beatable. (At least until we changed them anyway, but no one wants an arms war.) Dragons flight (talk) 02:17, 27 March 2009 (UTC)[reply]
This entire feature is an arms war, and apparently one fought by people who don't give a damn about civilian casualties (read: those of us who actually want to improve articles) -- 217.42.77.168 (talk) 17:12, 6 April 2009 (UTC)[reply]
We've tried public regex-based filters to prevent pagemove vandalism, with little noticeable effect. Mr.Z-man 03:45, 27 March 2009 (UTC)[reply]
That filter is quite different to the epic fail blacklist, but an email has provided me with a sufficent explanation to keep that filter private.   «l| Ψrometheăn ™|l»  (talk) 07:04, 27 March 2009 (UTC)[reply]
So does the mediawiki source code, last I checked that wasn't private -- 217.42.77.168 (talk) 17:09, 6 April 2009 (UTC)[reply]
He who trade freedom for security deserves neither and loses both, or at least that's what Ben Franklin thought. Maybe we should add and asterisk next to all the proclamations of openness and transparency that litter our statement of principle and other such articles. Burzmali (talk) 17:41, 6 April 2009 (UTC)[reply]
Filters at high risk for being circumvented are usually marked private to "cut them off at the pass." One of the bigger problems with titleblacklist, as we found out, as that everything was 100% open to the public to view; as a result, it was regularly circumvented— all that had to be done was for the puppeteer to add the page to his watchlist and adjust in one adjustment. Private filters, on the other hand, force the puppeteer to keep guessing, using up their throwaway accounts in the process, while being able to do nothing to disrupt the encyclopedia the whole time. It's basically the difference between using a clear lock on your house (where all of the pins are visible to someone trying to pick it) and using an opaque one. With the clear lock, the burglar can see the pin shears and have the lock open in a fraction of the time it would take him to brute force it. --slakrtalk / 01:55, 20 April 2009 (UTC)[reply]
It's my understanding that lock picking proceeds one pin at a time, and pins already picked are held in place, so I'm not sure your analogy is apt. How about the Titanic trying to avoid icebergs without sonar, when all that can be seen is the tip? --NE2 02:47, 20 April 2009 (UTC)[reply]
Or to continue the lock analogy, an addition to the titleblacklist is like trying to increase security by adding more doors. It adds an extra step, but getting around it is fairly trivial. A private filter is like putting a lock on the door, connected to an alarm that lets you know when someone is trying to break in. Mr.Z-man 02:56, 20 April 2009 (UTC)[reply]
Which works fine until you have a bunch of well-intentioned contributors standing around outside the building unable to get in. Gurch (talk) 10:14, 20 April 2009 (UTC)[reply]
Keeping the analogy: Gurch, do you realise, that blacklisting, semi-protection, and IP(-range)-blocking leaves us having those same bunch of well-intentioned contributors standing there (and probably even waaayyy more then with a reasonable filter)? But as we have conveniently turned all streetlights off around the building (with the well-intentioned, precise and irreversible use of a shotgun), we will never know! They can knock, scream, build a trebuchet, paint their faces purple, wear a superman outfit, drop their pants, whatever .. we don't know. Filters can put those editors in the spotlight, and when they knock on the door (or drop their pants :-p ) and can't come in, we can have a look, and open our doors in such a way that we keep out only those which we really want to keep out?
Does such a filter hunt away editors, possibly, but has the blacklisting, semi-protection or blocking alternative done the same? I have seen on some of our filters that we now get again those well-intentioned contributors back after they have been standing there in the dark for years. However, others knock, scream, cry ... --Dirk Beetstra T C 10:49, 20 April 2009 (UTC)[reply]

My experience of auto-censor false positives

In email group that have auto-censors, I have run across or heard of disallowals from these false positives:

  • "Penistone" (place in Yorkshire (UK)) as "penis"
  • "Scunthorpe" (steelmaking town) as "cunt"
  • "wristwatch" as "twat"
  • "Dick" as slang for "penis" where it clearly means a man's name
  • "No hard or soft pornography will be allowed" (in an email group's description) refused because of the word "pornography"
  • "CP" as "child pornography" where it meant "Canadian Press" (in an email group's description)

Anthony Appleyard (talk) 06:22, 30 March 2009 (UTC)[reply]

I suppose it's a good thing we don't use one of those auto-censors, then. --Carnildo (talk) 10:03, 30 March 2009 (UTC)[reply]
But we don't use those sorts of filters, for precisely that reason! ╟─TreasuryTagcontribs─╢ 07:11, 31 March 2009 (UTC)[reply]

Bambifan101

User:Bambifan101 is a long-time IP-hopping sockpuppeteer devoted to vandalizing articles about Disney-related topics. Would it be possible to craft a filter that works on a combination of their various known IP ranges and topic-dependent words like "Disney" present in either the original article text or the new article text, and then blocks those edits? Autoconfirm isn't any use in this case, because they're known to create sleeper accounts. -- The Anome (talk) 14:12, 1 April 2009 (UTC)[reply]

Not exactly. For privacy reasons we can't directly check IPs if someone is using a username, which I assume this person is, but we can target articles with "Disney" in them as well as new-ish editors who only recently got autoconfirmed. Can you identify some recent socks to show what the vandalism looks like. Dragons flight (talk) 14:25, 1 April 2009 (UTC)[reply]
See Category:Wikipedia sockpuppets of Bambifan101. There are a few publically visible /16s which have come up repeatedly in vandal's IP edits, and they're the ones I suggest we filter: specifically 65.0.0.0/16 and 68.220.0.0/16. 70.146.0.0/16 looks like another good candidate. -- The Anome (talk) 14:32, 1 April 2009 (UTC)[reply]
One could write a filter to log whenever someone from one of those ranges anonymously edits a "Disney" topic. I'm not seeing much of a pattern that would allow one to do anything stronger than that. Dragons flight (talk) 14:48, 1 April 2009 (UTC)[reply]
Disney, Teletubbies, or my page would be very useful as he seems determined to get me to go back to watching for him. He is now on the 65 range again. *sigh* -- Collectonian (talk · contribs) 19:55, 3 April 2009 (UTC)[reply]
Also, he is now back to using his 70.146.X.x IP range. -- Collectonian (talk · contribs) 00:10, 4 April 2009 (UTC)[reply]

Just checking to see if there is any movement on this one, as he just struck again with both the IP and a new named sock. -- Collectonian (talk · contribs) 02:04, 7 April 2009 (UTC)[reply]

This request has languished because it isn't entirely clear (at least to me) what you want.
We can log every edit by unregistered editors in 65.0.0.0/16, 68.220.0.0/16, and 70.146.0.0/16 to "Disney" / "teletubbies" pages, which may be helpful. We can't do much about logged in editors though. One could log every new editor to a Disney page, but that doesn't seem practical and would be prone to many false positives. We don't get IP information once someone has logged in. Without something more precise than "he edits Disney pages" the logged in socks would be difficult to target. Dragons flight (talk) 02:31, 7 April 2009 (UTC)[reply]
I don't really understand how these new filters work, so I'm not sure how to fully give the info needed. He frequently hits almost any Disney article (films, series, characters, related books), usually removing tags, reverting to very old versions, restoring long removed trivia, etc. He also loves to go blank article talk pages, randomly remove stuff, revert archiving, edit other people's comments. He lately also goes to RPP and tries to get "all the Disney articles" un semi-protected, protection instigated because of him. He often edits talk pages of his confirmed socks and IP socks. Newest thing is randomly hitting an anime/manga just to get my attention. :( Whenever he comes in with an IP, he will usually make a named sock and "double stack", editing one behind the other as usually only one gets reverted when people rollback all his edits, thereby keeping the vandalism he wanted in place. You can see this here[3] and checking the contribs of today's IP Special:Contributions/68.220.175.82 and Special:Contributions/Hahabricks named sock. -- Collectonian (talk · contribs) 02:38, 7 April 2009 (UTC)[reply]

Recent deaths filter

Is it possible to have a filter set up to do something similar to what is done here? Recent example. Detecting death claims is needed, because false claims are a form of abuse it is good to be able to catch (though it does require people checking the log). That page was set up by User:Sam Korn (who pointed it out to me), so maybe ask him if it involves anything more complicated than detecting a change in category from "living people" to "2009 deaths". He might have managed to successfully detect all the other ways people edit an article to indicate someone has died. The problem with that list, and the changes logged by filter 117 (the removal of category "living people) is that there doesn't seem to be a way to patrol the log, to avoid people duplicating each other's work and checking the same things. Is there a way to patrol a log of an abuse filter? Carcharoth (talk) 00:19, 3 April 2009 (UTC)[reply]

Adding patrolling is on the to do list. Dragons flight (talk) 00:25, 3 April 2009 (UTC)[reply]
We could add the abusefilter-patrol rights to reviewers in the trial flaggedrevs implementation. Cenarium (talk) 21:02, 3 April 2009 (UTC)[reply]

Filter 97 (Personal attacks by new user)

This filter is triggered quite often when anons/new users either revert a talk page blanking, or manually archive a talk page, which is quite unfortunate. Adding "edit_delta > 10000" to the filter might solve that problem, but I'm not (yet!) an expert with all this, so I figured I better ask here first before making the change. :) --Conti| 16:12, 3 April 2009 (UTC)[reply]

& (edit_delta > 10000) should do that. FunPika 16:57, 3 April 2009 (UTC)[reply]
Alright, thanks! Modified the filter. --Conti| 19:30, 3 April 2009 (UTC)[reply]

Any examples? It might be worth killing off the words that are causing problems. BJTalk 22:15, 3 April 2009 (UTC)[reply]

This is the most recent one, and here's another one. I'm sure there are more, but it's pretty impossible to properly search through the log. --Conti| 22:32, 3 April 2009 (UTC)[reply]
The first hit was a bug in the regex ("shit" needed a word boundary), second hit was fine. BJTalk 01:16, 4 April 2009 (UTC)[reply]

Can I has old_html?

It's there when I examine an edit, but when I want to use it in the actual filter I get a syntax error. I'd like to use it at Filter 133 so it only catches newly introduced citation errors. --Conti| 20:24, 4 April 2009 (UTC)[reply]

old_html and old_text are unavailable for performance reasons. new_html and new_text may end up being killed for the same reason. It is processor intensive to parse the entire text of the new page. The use of added_lines, removed_lines, new_wikitext, old_wikitext, etc. are recommended whenever possible. Dragons flight (talk) 20:42, 4 April 2009 (UTC)[reply]
Hmm, dang. Without any of those, it would be impossible to check for the "cite error" error message, right? --Conti| 20:48, 4 April 2009 (UTC)[reply]
Yes, the only way to catch parser generated error messages is by parsing the page, unfortunately. Dragons flight (talk) 20:53, 4 April 2009 (UTC)[reply]

new_html and new_text are fine, the parse operation has to happen anyway. old_html and old_text should in theory be pullable from the parser cache, but I haven't got there yet. — Werdna • talk 01:58, 7 April 2009 (UTC)[reply]

We've had clear examples of new_html timing out the server on very large pages, and it can easily lead to delays that make a difference for user experience. I don't know why one should be able to preview or save a page that one can't filter with new_html, but it is pretty clear that is the case. Dragons flight (talk) 02:03, 7 April 2009 (UTC)[reply]
I'm striking the above because I can't seem to duplicate the problem in testing just now. Very large pages still seem to save even with filters like 133 enabled. Which leaves me a bit befuddled. I've certainly seen pages that appeared unable to save in the past, but being unable to duplicate the issue is surprising. Dragons flight (talk) 03:11, 7 April 2009 (UTC)[reply]
Well, I've always had problems editing very, very large pages, long before the abuse filter existed. This might be a dumb question, but how did you know that it was the filter that caused the editing problems? --Conti| 10:53, 7 April 2009 (UTC)[reply]
I'm not sure what problems DF is talking about, but I'm familiar with the issues MZMcBride was having with saving large pages. On sites with many filters (enwiki and test.wikipedia), edits that added more than ~250,000 bytes via the API timed out after about a minute with a 504 Gateway timeout. On mediawiki.org, which only has 1 filter and on my normally-slow test wiki which only had a couple random filters, the edits worked fine. So its not a lot of data, but it certainly looks like some correlation between number of filters and edit success. Mr.Z-man 23:03, 7 April 2009 (UTC)[reply]

Cleanup

I just removed many (12 or so) old filters with little or no hits for performance reasons. Prodego talk 22:54, 4 April 2009 (UTC)[reply]

You turned off 19 filters (not 12). I have re-enabled 9 of these. 7 were filters with real hits in the last week and performance burdens < 4 ms. The other two were less than 2 days old. In the mean time I've fixed an unrelated filter that was throwing runtime numbers over 50 ms, which means it had more load than the other 9 combined. I'm working on creating better tools for monitoring load, but we shouldn't throw away the highly targeted low load filters just because they get relatively few hits. Dragons flight (talk) 04:00, 5 April 2009 (UTC)[reply]
Sorry, but I feel to remark this as 'rediculous'. I have reenabled another handful. These filters are specific designed to keep out specific long term vandalism, and they should all be enabled as there is no reason to expect that the vandalism has stopped (the 3 hits for the Argentinian IP hopper were all three correct on the MO of the vandal, returned after 1 year of page protection, hitlerbunker is still active on de, was only not active here as the article was protected, etc. etc.). Please don't disable such rules on 'no hits' or 'likely no hits', they do what the abusefilter is for, stop abuse. If rules are to be shut down then consider to shut down purely monitor-only rules. Thanks. --Dirk Beetstra T C 11:20, 5 April 2009 (UTC)[reply]
I rest my case. --Dirk Beetstra T C 11:29, 5 April 2009 (UTC) .. well, not completely, it would not have caught what it should have. --Dirk Beetstra T C 11:41, 5 April 2009 (UTC)[reply]
MZMcBride reported he could not make large (bot, I think database reports) edits during high load times, because they timed out. This is directly a result of doing too many checks. If you reenable a filter, you add a small load. If you reenable 20 low load filters, they add up to one large load. If they get almost no hits, they are wasting resources, which apparently we need, if edits were timing out. I disabled 133 again, since, after testing, 3 out of 4 users who tried to edit a large article (Timeline of United States inventions and discoveries) had their edits time out. Please prioritize the filters, if they are getting almost no hits, they probably aren't worth the processing time. Prodego talk 15:03, 5 April 2009 (UTC)[reply]
The answer to the observation actually is in the section above. new_html and new_text require parsing the entire page. Those operations are blocking against large pages. Timeline of United States inventions and discoveries actually gave me a >50 second render time last night. Aside from the fact we really shouldn't have any pages that take that long to render, of course the filter will time out if you ask it to do that operation. It is quite possible that new_html and new_text will be entirely removed to prevent these problems, but if they aren't removed they need to be predicated with conditions specifying new_size less than a few kilobytes because many large articles and large discussion pages will kill performance. All of the other variables should give reasonable performance even on large pages (though it may be possible to create combinations and sequences of operations that are unreasonable). Yes we need better tools to evaluate filter performance and prioritize implementation, but your approach wasn't very good because you didn't have a good measure for what was causing problems. Some filters average 4 ms, but will never ever take more than 10 ms, while others might average 10 ms but once in a 10000 operations take seconds to execute. The first behavior is rarely, if ever, a problem. The second is potentially a big issue. Dragons flight (talk) 17:00, 5 April 2009 (UTC)[reply]
Now that would have been a better explanation (and I disagree on calling the filters 'useless' ... but this might be part of the thread 2 below here (server sluggishness). --Dirk Beetstra T C 15:08, 5 April 2009 (UTC)[reply]
That should be unrelated, it is pretty well after I disabled several of them. I reworded the 'useless' above. Prodego talk 15:10, 5 April 2009 (UTC)[reply]
Hmm, regarding filter 133, would it help if it would only check articles that are smaller than, say, 50k? --Conti| 15:13, 5 April 2009 (UTC)[reply]
Yes, although please test it on a 50k article if you do, on a non-autoconfirmed account. Also remember this is a low load time, so if it takes more than 2 or 3 seconds to make the edit, it might time out when the servers are under higher load. Prodego talk 15:51, 5 April 2009 (UTC)[reply]
In my opinion, the number should be more like 5k not 50k. Dragons flight (talk) 17:22, 5 April 2009 (UTC)[reply]
The filter wouldn't be very useful then, sadly. Is there any way to reliably find out with which settings the filter is starting to cause problems? --Conti| 18:43, 5 April 2009 (UTC)[reply]
Everything that generates a cite error is already being placed in Category:Pages with incorrect ref formatting which is almost certainly a better approach. Dragons flight (talk) 19:11, 5 April 2009 (UTC)[reply]
Well, it would have been a nice way to prevent citation errors in the first place. :) --Conti| 19:15, 5 April 2009 (UTC)[reply]
Again, it would be useful to have the test function for pages, with a timer. Then it can be seen how long it takes to test 100 edits on a large page, and from that we would know which filters are likely giving problems. My guess is that it are the rules which check page content vars on large pages (which may even look relatively fast on average, they do give time-outs on the large ones). --Dirk Beetstra T C 16:00, 5 April 2009 (UTC)[reply]
So I would ask that those of you who reenabled filters please reconsider if the load is worth the filter. All the filters I disabled had less than 10 hits, which seems to be wasting resources to me (10 edits out of how many tens of thousands?), and disable as appropriate. If they aren't hitting anything, they aren't very good filters. Prodego talk 18:15, 5 April 2009 (UTC)[reply]
You're completely missing the point of targetted filters. Abuse filters are designed to do just that, stop abuse. We have some patterns of abuse that are incredibly common, and so naturally the filters that block them get a lot of hits. We also have some patterns of abuse that are practiced by notorious users or are otherwise very specific: we (finally) have the ability to block that vandalism too using targetted filters. If the filters are now not getting many hits, yet the problems were previously widespread or severe enough for filters to be designed against them, that is an indicator of success, not failure. Happymelon 18:28, 5 April 2009 (UTC)[reply]
It is more likely that they are just getting around the filter. As I mentioned on Dragonflight's talk page: "Lets use filter 118 as an example. The filter has 2 hits. It has been enabled 5 days, and has a 2.6ms average run time. So if we have about 2 edits per second, that means that in 5 days 37 minutes of server time were used to get those 2 hits, or 18.5 minutes per hit." 18.5 minutes of server time are used to hit each 'bad' edit (one of them was actually a good edit). With that same amount of server time, 56 edits could be hit using filter 30. It is a matter of priorities, if a filter is only stopping a very small amount of 'bad edits' then that time would likely be better used with a filter that can catch many times the bad edits as that filter would, without using significantly more time. For example, filter 118 takes 2.6ms to evaluate, on average. 30 takes 3.1. It is pretty obvious which is a more efficient use of resources. Prodego talk 19:07, 5 April 2009 (UTC)[reply]
So if vandals get around the filters, we should remove them? Whose side are you on? :D Wikimedia has over three hundred servers: number crunching and concluding that, when you gear up the processing time four hundred and thirty thousand times, you suddenly get a noticeable number, is both factually obvious and totally pointless. WP:PERFORMANCE applies here just as much as anywhere else: the devs have said "this is a potentially server-intensive operation, here are the boundaries we need you to stay within". That does not, and never has, equated to "stay as far away from the boundaries as possible". Wikimedia server time is not pay-as-you-go; the servers are there, using electricity, whether we're using them or not. I'm not saying we should ignore benchmarking or efficiency, of course it's important. I'm saying that what the devs have told us time and again to do, is to do whatever is best for the project, and let them pick up the pieces on the technical side. If that means we have to make a choice between a filter that get 100 hits a day and one that gets 2, obviously we'll go for the former. You've made a choice between stopping vandalism and not stopping vandalism by doing what's best for the servers, not the project. Ignore the servers until they come and bite us in the arse; then it'll be time to make the hard choices. Happymelon 19:31, 5 April 2009 (UTC)[reply]
If we are at the point where MZMcBride is reporting he can't make a valid edit because it is timing out due to the abuse filter, then we are at the point where we have to remove some things. I am creating a table showing the efficiency of all the enabled filters at User:Prodego/Sandbox. I suggest everyone take a look. Prodego talk 20:18, 5 April 2009 (UTC)[reply]
Yes, we needed to fix the one rule that was blocking. The rest of what you disabled was largely irrelevant. Keep in mind that in order to time out the collective server operation has to run longer than about 30 seconds. new_html and new_text can do that on large pages, but almost nothing else will ever even approach a 30 second runtime. Dragons flight (talk) 20:23, 5 April 2009 (UTC)[reply]
HM is basically correct. We can't totally ignore performance because it is possible to write rules that have dreadful performance characteristics. (Things like 133 can totally block editing of very large pages, as noted above.) But we also shouldn't freak out about performance. The servers in general have spare capacity. With performance in mind I've personally written multiple patches for the AbuseFilter, and our current execution burden is only about half what it was during the first week with the AbuseFilter despite the increase in rules. The choice between 118 and 30 offered above is a false dichotomy because it is currently reasonable to have both. If and when we start hitting practical performance limits, we would need to start prioritizing more directly. For the moment though, the issue to focus on isn't 2 ms vs 5 ms, but rather well constructed rules taking 10 ms versus poorly considered rules taking 100 ms. I've already been poking at the bad rules fairly aggressively. Dragons flight (talk) 20:19, 5 April 2009 (UTC)[reply]

This is true, however, if users are reporting edits timing out (and you can test this yourself too) then there is a problem. The best way to deal with it is to ensure we are using our resources in the most efficient way. The table at User:Prodego/Sandbox shows how the filters use resources. This is useful to know, since it lets us judge if a filter is helping or slowing down edits unnecessarily. Prodego talk 20:32, 5 April 2009 (UTC)[reply]

Indeed users are reporting edits timing out, and indeed this is a problem. The problem can have one of two causes: a filter which runs abnormally slowly in the particular situation, or having one or more filters which are generally slow. Both are symptomatic of poor filter design of the filter(s) in question. Asserting that the best way to "deal with" this problem is to start disabling other filters is, at best, somewhat bizarre. Happymelon 21:33, 5 April 2009 (UTC)[reply]
You forget the the third option: having too many filters. Prodego talk 00:04, 6 April 2009 (UTC)[reply]

Prodego, if there are 5 filters with an average of 3 milliseconds, then that adds 15 milliseconds to the average edit. Indeed, removing 3 of them is a huge improvement (60%!). But if one of those 5 actually has a median at 2 milliseconds, but has a processingtime of 30 milliseconds on the large pages, then removing the 3 fast ones (which are fast except on the ONE page they actually should hit on!) has only an improvement on the whole from 42 to 33 milliseconds (about 25% gain) .. if 40 milliseconds times out, then indeed, having the three disabled solved your problem, but the problem would be better solved with having the slow filter being reduced to 15 milliseconds, as that would enable all filters, especially those which do what this filter was designed for, top abuse! May I again point at my suggestion somewhere else where we have a split off of filters which are NEVER going to set to warn, block or prevent into a after-edit-abusefilter (it could even be in the same system, I envisage that it is easy to first evaluate the action-filters, and if no action is performed that results in the edit being saved, followed by the processing of the non-action filters, that would even be good for testing!), as they do not have to monitor in real time? --Dirk Beetstra T C 12:11, 6 April 2009 (UTC)[reply]

  • Prodega disabled two filters I created that filter out longterm vandalism to two individual articles. As a result, ther was an instance of the same vandalism today.[4] My understanding is that the filters only activate if cases where the articles are edited. Can someone indicate exactly how much load is due to single-article filters?   Will Beback  talk  01:35, 7 April 2009 (UTC)[reply]

I'm a bit late, but there's no performance burden for new_html and new_text, the page has to be parsed anyway. — Werdna • talk 01:57, 7 April 2009 (UTC)[reply]

Does that mean that the editing penalty of a filter using new_html or new_text is washed out by the time spent on other operations, and that it doesn't apply at all to pages rejected by the first expression? Is there an example of a good filter using new_html or new_text? More information on optimizing filters would be helpful all around.   Will Beback  talk  10:54, 7 April 2009 (UTC)[reply]
Werdna, do you mean that a filter that does not execute 'new_html' has the same burden as one that does not do that? So (user_name = "Beetstra" & new_text contains "Blah") has the same execution time for you as for me? And the rule has the same execution time if I edit U of A (1,025 bytes) as if I edit Timeline of United States inventions and discoveries (417,316 bytes)? --Dirk Beetstra T C 10:58, 7 April 2009 (UTC)[reply]
I believe what he means is that there is no performance burden in creating the text that fills NEW_HTML and NEW_TEXT, because that material has already been parsed. That is, there is almost no performance difference between a rule that reads NEW_TEXT contains "foo" and "foobar" contains "foo", as the AbuseFilter already has the text that will fill the NEW_TEXT variable if it is used in a filter. The usual performance impact of evaluating the contains still apply, so the rule you suggest will take longer for an edit by you, Beetstra, than an edit by another user. I could be wrong, but I think this is what is meant. Happymelon 11:44, 7 April 2009 (UTC)[reply]
It is also what I think, with the small difference that the difference between (<1 Gb of text> contains "foo") and ("foobar" contains "foo") will be a bit, but not too much, quicker for the latter.
The other however has as a problem, that if it filters out 20% of the edits in the first part (say '!"user" in user_groups), and the average is 10 milliseconds of the overal rule, that that means that .. (performs difficult math in his head) .. err .. the rule would run e.g. 4 times with a speed of 5 milliseconds, and 1 time with a speed of 30 milliseconds (now assuming that the average time to run a two-part statement would be 50% for the first statement and 50% for the second statement, which for difficult rules and for 'uneven' procedures may not be the same)? So if that second part gets run on a large page (very likely when filtering on users, but not on pages), then a timeout could occur when editing a large page (if the effect is a bit more extreme than what I describe). --Dirk Beetstra T C 11:56, 7 April 2009 (UTC)[reply]
I expect that the balance is indeed very much more uneven; running the !"user" in USER_GROUP test probably takes << 1ms, but running a contains on a large page could take 100ms. But the same principle applies, as you say: taking the average disguises the outliers where certain situations prompt the rule to take much longer than the average. We know that, on average, the load on the AbuseFilter is acceptable. It's the outliers that we need to be looking at. Happymelon 12:39, 7 April 2009 (UTC)[reply]
The key isn't how long it takes to get a variable, that is pretty quick (with the notable exception of added_links and removed_links). But searching large variables takes a lot of time, which is what Beetstra mentioned above. The average is just an average, it doesn't tell the whole story, like Happy-melon mentioned. You guys have got it now. :) Prodego talk 21:31, 8 April 2009 (UTC)[reply]

Global Abuse filters

There is a discussion on meta to enabled global abuse filters that will affect wikis with the abuse filter extension, including the English Wikipedia. Please give you input. Thanks. Techman224Talk 01:00, 5 April 2009 (UTC)[reply]

I think, like with global bots, we should opt out of them. Ruslik (talk) 15:19, 5 April 2009 (UTC)[reply]
psst, we don't opt out of global bots Mr.Z-man 17:32, 5 April 2009 (UTC)[reply]
G. bots can only be used to update interwiki links, which is almost a complete opt out. Ruslik (talk) 17:38, 5 April 2009 (UTC)[reply]
With global bots, there are no conflicts possible technically because they are restricted to interwiki tasks and use different account. Global filters could conflict because it uses an extension installed on meta and here. Techman224Talk 18:05, 5 April 2009 (UTC)[reply]

Server sluggishness

I don't think it is an abuse filter problem per se, but all of the filters have started reporting large processing numbers (2-5 times normal). This occurred even with filters that haven't been changed. I suspect this particular problem is some unrelated high load affecting WMF in general. Filters are certainly capable of creating high loads, but my monitoring suggests it is not our fault (at least this time). So for the moment, don't panic. Dragons flight (talk) 05:55, 5 April 2009 (UTC)[reply]

Yep, not us, one of the memcached servers died. Dragons flight (talk) 06:03, 5 April 2009 (UTC)[reply]

My first filter

I tried my first filter, #135, to catch people just holding keys down or copying and pasting 50 times. I'm accumulating improvements from false positives, but also have a few weird ones like this one. Where's the repetition there? Any other comments/ideas? —Wknight94 (talk) 04:47, 6 April 2009 (UTC)[reply]

Why is it marked private? --MZMcBride (talk) 07:22, 6 April 2009 (UTC)[reply]
I found neither documented standards nor an obvious pattern in existing filters so I basically flipped a coin. —Wknight94 (talk) 11:07, 6 April 2009 (UTC)[reply]

Performance data

Wikipedia:Abuse filter/Performance. This is still subject to experimentation. It's got about 36 hours of data so far, and hence the (7 day) columns aren't very meaningful yet. Also, the 1 hour column can be quite noisy for a variety of reasons. Please don't use this as a reason to start slashing at things, because the general load is okay right now. But if we do ultimately need to have discussions about prioritizing then this can provide a more tangible and long-term basis for judgment. Dragons flight (talk) 05:24, 6 April 2009 (UTC)[reply]

You've got two sets of time columns there. What's the difference between them? --Carnildo (talk) 05:51, 6 April 2009 (UTC)[reply]
If I understand your question, the first set is hits in the last X time interval, and the second set is average execution time of the rule during that time interval. Dragons flight (talk) 05:55, 6 April 2009 (UTC)[reply]
Would it be possible to do something like getting information on processing time for different input sets? E.g. for an 'A & B & C' filter it would be nice to see the processing time for 'A = false', 'A = true and B = false' and 'A = true and B = true and C = false' .. Rules which hit only one page or a small group of editors are generally very fast (1-2 msec), but they might give problems when hitting really on a really big page. Or getting information on 'fastest and slowest processing time measured', or percentages of time that the filter runs <1 msec, 1-2 msec, 2-3 msec, 3-5 msec, 5-10 msec, 10-20 msec, and 20+ msec (rules should preferably have a very low count in the latter 2-3 of these categories) might already help. IMHO, the averages don't mean too much. --Dirk Beetstra T C 12:33, 6 April 2009 (UTC)[reply]
What unit of time is on the right?   Will Beback  talk  01:51, 7 April 2009 (UTC)[reply]
The first table seems to show that all filters have taken up a total of 321.9 milliseconds in one hour, and 347.72 ms in one day. Is that correct?   Will Beback  talk  01:54, 7 April 2009 (UTC)[reply]
That is the mean execution time per edit during the last hour/day/etc. So a number of 333 ms means that the average edit took 1/3 of a second longer to save because of filtering. Dragons flight (talk) 01:58, 7 April 2009 (UTC)[reply]
Can you add the number of users who were warned and did not proceed to save their edits, and the number of users who were warned and hit the report FP link. Dy yol (talk) 17:53, 11 April 2009 (UTC)[reply]

Filterable actions

Is it possible to have the filter act based on the deletion of a page? Thanks. Someguy1221 (talk) 09:30, 6 April 2009 (UTC)[reply]

Problem with single-quote in regex rlike

I had to undo a change because a single-quote made the whole regex fail. Anyone know how to include a single-quote in a [ ] group? I tried two single-quotes and preceding with a backslash - no go. Thanks. —Wknight94 (talk) 12:18, 6 April 2009 (UTC)[reply]

\' should work. — Werdna • talk 01:04, 7 April 2009 (UTC)[reply]
Figured out my problem was with the dash, not the single-quote. I had two characters around a dash inside square brackets, and that makes it match anything in the ASCII range between those characters. I made sure the dash was the last character in the square brackets and all is well (except I got shut down for having too many false positives). —Wknight94 (talk) 03:17, 7 April 2009 (UTC)[reply]

Penis

Isn't this just the sort of thing the abuse filter is for? I take it this wasn't caught because there were words other than penis in the edit? Rd232 talk 12:36, 6 April 2009 (UTC)[reply]

I'm new at the filter thing but it seems like the risk of false positives is fairly high. Although I suppose catching all-caps would reduce that risk. Otherwise, it is a clinical term. —Wknight94 (talk) 13:57, 6 April 2009 (UTC)[reply]
Surely any edits like this (or other "clinical" terms) by a brand-new anon user (or non-confirmed user) is likely to be vandalism? ~~ [ジャム][t - c] 14:09, 6 April 2009 (UTC)[reply]
Not so surely. IPs write a large percentage of the legitimate content on this site. You'd basically be saying no IP could easily write content for many of these pages. Something more specific would be needed IMHO, like all-caps "PENIS" or "penis" allowed but flagged for revision or something. Maybe "penis" allowed only if "penis" was already in the article (and that officially breaks my personal record for most uses of the word "penis" in one post). —Wknight94 (talk) 14:35, 6 April 2009 (UTC)[reply]
I didn't mean to imply it would be totally straightforward to avoid false positives. But surely a high % of "penis" in an edit by an anon to an article which has no prior mention of it, say, could be at least flagged if not blocked. (and can we use categories to filter too? eg block if it's outside category urology; flag if it's in). Rd232 talk 14:53, 6 April 2009 (UTC)[reply]

Why are so many of these "private"?

E.g. "common page move vandalism". Fulfils a purpose previously served by MediaWiki:Titleblacklist quite well without that needing to be private (and still served by it, just to confuse people). Why is it necessary to become an administrator just to find out what you are and aren't allowed to move pages to? Isn't this project supposed to be open or something? —Preceding unsigned comment added by 217.42.77.168 (talk) 16:39, 6 April 2009 (UTC)[reply]

See above at #Criteria for a Private Filter. –xeno (talk) 16:41, 6 April 2009 (UTC)[reply]
Which doesn't answer my question. And furthermore claims that MediaWiki:Titleblacklist had "little noticeable effect", whoever wrote that has evidently forgotten the occasions when nobody could create or move anything because some administrator fucked up the regular expressions -- 217.42.77.168 (talk) 17:08, 6 April 2009 (UTC)[reply]
See Dragonflight's comments at 02:17, 27 March 2009 (UTC). –xeno (talk) 17:11, 6 April 2009 (UTC)[reply]
Yeah, the one that starts with "he", implying that this entire feature is only here because of one person, never mind what is good for the rest of us -- 217.42.77.168 (talk) 17:13, 6 April 2009 (UTC)[reply]
It's just an example. Now, my main goal by watching this page is to ensure that AbuseFilters don't disenfranchise anon users such as yourself; if there is a specific cases that has prevented you improving the encyclopedia, please do let me know here, or on my talk page or file a WP:FALSEPOSitive report. –xeno (talk) 17:15, 6 April 2009 (UTC)[reply]

Moving 'disallow' action to be "restricted"

Just noting here that it's possible to restrict certain actions ('disallow' comes to mind) to a smaller group of users. Is this desirable? It would be possible to require a bit more of a formal assessment of filter performance before setting a filter to disallow. Not sure about whether we want it or not, just raising it as an option. — Werdna • talk 03:04, 7 April 2009 (UTC)[reply]

How is this implemented? Once a filter is set to disallow, does that mean only disallow-able editors would be able to edit the filter? Dragons flight (talk) 03:13, 7 April 2009 (UTC)[reply]
The problem now seems to be lack of process. Who determines whether a filter is appropriate? How do they make that determination? Some people just create a filter and immediately make it disallow with zero hits. I created one and watched. The false positives seemed acceptable to me but it was turned off within a few hours for too many false positives. With no published standards, how could I know that? —Wknight94 (talk) 03:31, 7 April 2009 (UTC)[reply]
Concur, better guidelines would be helpful. Dragons flight (talk) 03:39, 7 April 2009 (UTC)[reply]
And would that mean that you need a 'disallower' to implement emergency filters? --Dirk Beetstra T C 10:52, 7 April 2009 (UTC)[reply]

The general implementation is that you can't save a filter with the 'disallow' action if you're don't have a right called abusefilter-edit-restricted. A side-effect would be that the emergency disable mechanism would also disable 'disallow' filters. I do agree with Dragons Flight that some guidelines at least, if not some kind of process (especially for hidden filters), would be helpful. — Werdna • talk 14:56, 7 April 2009 (UTC)[reply]

I agree with guidelines as well. Restricting it might mean that most can't use it for emergency filters (which are often both set to disallow and hidden per beans), but that should not mean that it does not need to be thought through before they get enabled. --Dirk Beetstra T C 10:30, 8 April 2009 (UTC)[reply]

Current Trends

Based on the data as of 4/8/09

Category Private Public
Total Filters 41% 59%
Total Hits 2.5% 97.5%
% of Filters that disallow 83% 17%
% of Category that disallow 80% 12%
% of Total Disallows 28% 72%
% of Hits against category resulting in Disallows 88% 6%

Not very interesting numbers, but it does show that private filters are being used to disallow edits with very specific editing patterns as opposed to public disallow filters which have much wider scope, since despite making up only 17% of disallow filters, public filters result in 72% of disallow actions. Burzmali (talk) 14:42, 8 April 2009 (UTC)[reply]

That is interesting data, thank you! :)Werdna • talk 02:05, 16 April 2009 (UTC)[reply]

AbuseLog appearance

I found the entries in Special:AbuseLog overly descriptive, so on another wiki I changed MediaWiki:Abusefilter-log-detailedentry into something like

$1: $4: [[Special:AbuseFilter/$3|$3: $7]], $2 on $5, action: $6 ($8) ($9)

so the log entries look like

Just a thought... —AlexSm 15:59, 10 April 2009 (UTC)[reply]

Interesting; I agree that there's scope for improvement. I turned your example into regular text with live links, so we can see better what it would look like. I think this might be a little too compact. How about:
$1: $2 on $5 ($4), [[Special:AbuseFilter/$3|$7]]. Action taken: $6 ($8 | $9)
Thoughts? Happymelon 20:41, 10 April 2009 (UTC)[reply]

That's going to stop working soon, because of some changes I've made to global filters $3 will be replaced by a link to the filter, rather than the actual filter. The link text will be in a separate message. I suppose I could pass the filter name to that message, though. — Werdna • talk 02:04, 16 April 2009 (UTC)[reply]

I see, so if it hits local filter #3 it would link to Special:AbuseFilter/3, whereas if it hit global filter #3 it would have to link to, eg, meta:Special:AbuseFilter/3. Why does the link text need to be in a separate message? Happymelon 13:36, 19 April 2009 (UTC)[reply]

Stop creating filters to catch one specific instance of vandalism

Things like this. Unless you want to make 1000000 filters most of which will have 0 hits and slow editing to a crawl. kthx 86.164.203.7 (talk) 20:51, 11 April 2009 (UTC)[reply]

Well, Prodego, an administrator here, says that it's recurring vandalism. In general, our admin community has a lot of technical expertise; the filter is run by Werdna, who is payed full-time to work on the system. I'm sure that they know what they're doing. ╟─TreasuryTagcontribs─╢ 20:53, 11 April 2009 (UTC)[reply]
And that's a strange example since it's not even enabled. —Wknight94 (talk) 21:33, 11 April 2009 (UTC)[reply]
Actually, your admin community is picked for their article writing experience, and frequently screw up regexes and do things like block article creation, block new user creation and deautoconfirm several hundred people (not mentioning any names) -- 86.164.203.7 (talk) 21:39, 11 April 2009 (UTC)[reply]
No, they're not. WP:RFA, WP:ADMIN etc... ╟─TreasuryTagcontribs─╢ 21:46, 11 April 2009 (UTC)[reply]
As was already said, the filter in question is not enabled and hasn't been for a week. In any case, the first check it does is for an exact match on the page title, on any article other than Warren G. Harding the time it adds to the edit would likely be <5 ms. Mr.Z-man 21:44, 11 April 2009 (UTC)[reply]
That would be one of the "Colbert Report" filters. It would have been nice to have had in place when the Warren G. Harding siege was going on. I don't know why an IP address would have a problem with such a filter - unless it would thwart his own attempts to vandalize the article. Baseball Bugs What's up, Doc? carrots 02:55, 12 April 2009 (UTC)[reply]
I said what? Don't confuse my comment with that of User:Will Beback ([5]) who enabled the filter. I in fact am the one who disabled that filter. In general I agree that filters should be targeted so to prevent the maximal amount of vandalism for the minimum amount of resources. In some cases this is a very wide filter, in some cases it is a very narrow one. Prodego talk 05:29, 13 April 2009 (UTC)[reply]
I agree that every extra filter imposes some cost in time and resources, and that we should seek to gain the greatest benefit. We have the recurring situation of certain specific phrases being inserted into single articles, like San Diego, Elephant, or Warren G. Harding. The "Colbert" vandalism seems to drop off because there are few re-runs, while other shows or media may have a more lasting effect. Perhaps using a mix of filters (to cover the peak periods) followed by bots (once the frequency drops off) would address the problem best? Also, if there are ways of optimizing filters for specific articles then that might reduce the problem too.Are there opther ways of handling this kind of vandalism that are better than filters?   Will Beback  talk  06:13, 13 April 2009 (UTC)[reply]
According to "Wikipedia:Don't worry about performance", we should let others concern themselves with system performance. Baseball Bugs What's up, Doc? carrots 08:45, 13 April 2009 (UTC)[reply]
That essay was written before administrators were given the ability to potentially add several seconds of processing time to every edit. Gurch (talk) 19:17, 18 April 2009 (UTC)[reply]

We are the others Baseball Bugs. The filters are a substantial load, we do have to worry about performance. @Will, generally rotating in more specific ones works for things like Colbert vandalism (which is periodic). Prodego talk 04:18, 14 April 2009 (UTC)[reply]

On one hand, the nature of this feature means it's always easy to use it to drag the system down, so we do need to worry about performance. On the other hand, it we really needed a lot of single article (or group of page) filters, it should be possible to make them work efficiently, if the right functionality were implemented. But a single fast filter that's changed with the flavor of the week shouldn't create a performance worry. -Steve Sanbeg (talk) 19:58, 15 April 2009 (UTC)[reply]

False positives page

Can I get some reassurance that things listed on the false positives page will be read and acted upon? Even sampling random log entries while trying to test code for interacting with the logs, I keep finding stuff that shouldn't be there; I'm sure if I actually looked deeper, I'd find a lot more wrong. The paranoid desire to unnecessarily keep most of the details of these filters hidden from me is not exactly helping, either, it's like trying to debug software by examining its output when you don't have the source code. Are posts there likely to be read, or am I better off sending everything to the administrators' noticeboard? Gurch (talk) 17:47, 18 April 2009 (UTC)[reply]

Both cases you reported are for log-only filters, there have been no actions performed on the editor, the edit has been performed without the user noticing. I would think that sending these to the administrators' noticeboard would be unnecessery. I hope this explains. --Dirk Beetstra T C 18:15, 18 April 2009 (UTC)[reply]
Being log-only doesn't mean a filter should be catching things that don't match its description, especially when it's a private one and the description is all we have to go on. In fact, such a filter is worse than no filter at all -- you get the performance cost of having the filter without the benefit of actually having the abuse log, well, logging abuse. Gurch (talk) 19:15, 18 April 2009 (UTC)[reply]
No, of course, filters should be as good as possible, with an as small as possible number of false positives as possible (preferably: zero). But for some of them, having a filter which catches everything what you want to catch, plus a handful of false positives can also be better than not having a filter at all, as otherwise you will have to devise other ways of finding all occurances (see filter 129).
Although I agree that we need to keep an eye on performance, and that the overal performance of the 'pedia does not suffer under the filters, on the other hand, there should also be a drive to improve the system behind the filters to make them as fast as possible. --Dirk Beetstra T C 19:24, 18 April 2009 (UTC)[reply]

where should I go?

Hi folks - and no need to point me in the obvious humor direction for my thread title ;). I got a spam filter notice for ezine.com when I tried to use it as a reference. I looked first to the MediaWiki talk:Spam-blacklist/log page, but I can't seem to find the right area to ask. Is ezine.com considered to be an unreliable site for reference? Thanks. — Ched :  ?  19:48, 18 April 2009 (UTC)[reply]

  • I also looked at Wikipedia talk:WikiProject Spam and didn't see anything. I should have read the message I guess, instead of assuming it was an "Abuse filter" issue. Instead I just backspaced out of the edit warning window, and used a different reference. Oh well, any info on it would be appreciated. — Ched :  ?  19:57, 18 April 2009 (UTC)[reply]
    • You didn't hit any filter, at least. --Conti| 20:12, 18 April 2009 (UTC)[reply]
      Yes, there are various different filters and blacklists that an edit has to pass through to be saved; unfortunately, rather a confusing situation for contributors. In this case, my guess is you were blocked by an entry on the spam blacklist added by someone who didn't bother logging it because it was obvious to them why they didn't like the look of the site. Looking at the site myself, I have to agree with them. "Submit Your High Quality Unique Articles To EzineArticles.com In Exchange For Traffic & Exposure Back To Your Website!" doesn't exactly scream "reliable source". Gurch (talk) 20:36, 18 April 2009 (UTC)[reply]

Thanks for the input folks ;) — Ched :  ?  04:07, 19 April 2009 (UTC)[reply]

My mistake, already fixed

On filter 58, already reverted. Sorry. NawlinWiki (talk) 02:12, 19 April 2009 (UTC)[reply]

You mean to say you hadn't made enough mistakes with MediaWiki:Titleblacklist? If you can't test before enabling, don't edit the filter. --NE2 04:43, 19 April 2009 (UTC)[reply]
I'm wondering why the abuse-filter code even accepted that change. I wouldn't think '("string1" & "string2")' is even syntactically valid, but since it is, I can see why it matches all edits. --Carnildo (talk) 04:46, 19 April 2009 (UTC)[reply]
PHP is weakly typed. MER-C 13:07, 19 April 2009 (UTC)[reply]
Looking further into it, the abuse filter appears to be effectively untyped, and the problematic addition didn't match all edits, only those with the character '1' in added_lines. --Carnildo (talk) 01:10, 20 April 2009 (UTC)[reply]
Somehow I knew it would be you :/ can't you get someone else to do these things? Gurch (talk) 14:19, 19 April 2009 (UTC)[reply]

AfD filter #147

I've created a filter (currently disabled) to check AfD !votes. It should be able to flag bolding problems like '''delete''. I've tested it myself several times, but could someone else make sure it works, since I'm new to the abuse filter and it involves the apostrophe (a tricky character)?

Also, I'd like to extent it to the following perhaps:

  • Making sure people don't just cast a vote in an AfD; require them to give a reason.
  • Making sure people don't make syntax errors such as failing to close bold/italics/other formatting, in other places like articles.

King of 00:20, 20 April 2009 (UTC)[reply]

I don't think forcing people to give a reason is a particularly good idea. –xeno talk 01:12, 20 April 2009 (UTC)[reply]
This is the abuse filter, not the "enforce my pet guidelines" filter. Gurch (talk) 10:15, 20 April 2009 (UTC)[reply]
I agree with Gurch; the abuse filter is not intended for style enforcement. -- The Anome (talk) 11:04, 20 April 2009 (UTC)[reply]

IP-ranges

I'm trying to create a filter on svwp which prohibits a certain ip-range from editing certain articles. But when I try to use the ip_in_range thing it either catches every IP or none at all. Presumably I'm doing it wrong! Can you tell me how to use it to catch a certain range? Would be much appreciated. Njaelkies Lea (talk) 17:15, 21 April 2009 (UTC)[reply]

The command is 'ip_in_range(user_name,"1.2.3.4/24")', see it for example at work in Special:AbuseFilter/38. Hope this helps. --Dirk Beetstra T C 17:58, 21 April 2009 (UTC)[reply]
I can't see the private filters on enwp unfortunately as I'm not an admin here, but I got the information I needed. Thank you! Njaelkies Lea (talk) 18:56, 21 April 2009 (UTC)[reply]

On Wheels

Can we add "Willy on wheels", "on wheels" and various capitalizations. Spate of vandalism earlier today in this regard and obvious historical reasons.--Fuhghettaboutit (talk) 12:42, 22 April 2009 (UTC)[reply]

Oh, and if not already done, "Haggar" "Hagger" etc. with various spellings, capitalizations and diacritic use would be appropriate.--Fuhghettaboutit (talk) 12:46, 22 April 2009 (UTC)[reply]
There are legitimate uses, such as Meals on Wheels. --NE2 13:17, 22 April 2009 (UTC)[reply]
So we limit it to avoid false positives. If "on wheels" is too general for content filtering for article additions, use only "willy on wheels" and "willy-on-wheels" with various capitalizations. Page moves, however, should be prevented from any existing title to anything "on wheels", as this was a major modus operandi. Limiting this to just moves may prevent much damage and it's very unlikely to result in more than 1 or 2 false positives over many years. In that unlikely event, an admin can be contracted or a requested move request can be made.--Fuhghettaboutit (talk) 17:23, 22 April 2009 (UTC)[reply]
I agree it is a reasonable thing to target if it is a current problem, but it's been a very long time since I've noticed "on wheels" vandalism. Can you point to the recent examples? Dragons flight (talk) 19:42, 22 April 2009 (UTC)[reply]
Yeah, this got old in 2005, nobody does it any more. Gurch (talk) 20:06, 22 April 2009 (UTC)[reply]

See these 14 pages created today. These are not all of the ones created today; just the ones I protected out of a larger group, and that I can therefore easily find. We still get Willy on Wheels vandalism, despite that it's all copycat. Unless the abuse filter has a very finite number of things it can look at, I don't see the point of not doing it. And don't lose the second issue. Haggar is active today.--Fuhghettaboutit (talk) 20:55, 22 April 2009 (UTC)[reply]

Filter 98

Can this filter excluse autoconfirmed users (and not just sysops). Its quite clear by looking throught the logs that people tripping it are non-autoconfirmed and that the people who are autoconfirmed are doing so for a reason (IE making a good solid stub compared to incoherible jibberish). I can see no reason why sysops should be excluded from this filter if autoconfirmed users arn't Prom3th3an (talk) 02:28, 23 April 2009 (UTC)[reply]

Filter 131

For starters, i dont think removing an image (no matter how often it happens) is abuse and therefore within the morals of the abuse filter. Secondly. If i was a vandal I would spam those images (and / or upload images with simiar names and spam those to) on those pages because I know no one but a sysop could remove them which would take some time longer than a user. I think that filter as it stands (as proven above) is flawed and needs to be refined a bit. Prom3th3an (talk) 02:43, 23 April 2009 (UTC)[reply]

Filter 118

Is a joke and outside of the abuse scope by far. Noting that every rule slows the servers down I must wonder why Raul needs his own rule that does absolutly nothing and has no clear purpose. I would stongly encourage its deletion as Prodego already tried. Again, this is an abuse filter not some office clerk. The abuse filter was made to stop serious issues, it however was not made to stop things that we personally find annoying. Prom3th3an (talk) 02:53, 23 April 2009 (UTC)[reply]