Wikipedia talk:Edit filter

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by He7d3r (talk | contribs) at 13:51, 13 July 2013 (→‎Request for Interpretation: For filters in the 'default' group, the value is indeed 5% (the 'feedback' group has a higher limit of 20%). This is defined by the wmgAbuseFilterEmergencyDisableThreshold variable at InitialiseSettings.php.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconWikipedia Help Project‑class
WikiProject iconThis page is within the scope of the Wikipedia Help Project, a collaborative effort to improve Wikipedia's help documentation for readers and contributors. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks. To browse help related resources see the Help Menu or Help Directory. Or ask for help on your talk page and a volunteer will visit you there.
ProjectThis page does not require a rating on the project's quality scale.

Requests for permissions

Discussion

Request for Interpretation

In the uppermost row of Special:AbuseFilter:

What does "reached the condition limit of 1,000" mean?
In the Zhwiki's AbuseFilter, I encountered a problem that AbuseFilter can not block off the editing that matched filter rules. At the same time, the number reached the condition limit of 1,000 in Zhwiki is too much higher than Enwiki, see zh:Special:防滥用过滤器. I do not why it can be more than 1,000. Therefore, I suspect that the problem of blocking is due to this number is too higher. WHO CAN HELP ME?乌拉跨氪 (talk) 07:49, 6 June 2013 (UTC)[reply]
The first number should be kept below either 2% or 5%, or filters will start to be disabled. I can't find the precise setting but I've always thought it's 2% on this wiki. You can achieve this by disabling filters, combining them, or improving their logic to reduce the number of conditions. For a start, you can remove action==edit from most filters. Then for each filter make the first condition the one which removes the greatest number of edits from the rest of the filter. -- zzuuzz (talk) 09:07, 6 June 2013 (UTC)[reply]
You can also add conditions that will remove many edits with a minimal consumption of server resources. (For example, "!("confirmed" in user_groups) will eliminate the need to run the rest of the filter on any established users. With regards to edits matching the filter and not being caught, that is likely because those actions reached the condition limit before reaching the filter that they matched. When an action takes 1000 conditions, it stops being processed by the edit filter. Reaper Eternal (talk) 10:32, 6 June 2013 (UTC)[reply]
For filters in the 'default' group, the value is indeed 5% (the 'feedback' group has a higher limit of 20%). This is defined by the wmgAbuseFilterEmergencyDisableThreshold variable at InitialiseSettings.php. Helder 13:51, 13 July 2013 (UTC)[reply]
I am still not very clear about composition of limit 1,000. Each filter tries to match an editing fails once, so this filter will be counted a condition. When it tries 1000 times, this filter (or all filters?) will be disabled? The number reached the condition limit and the number matched one of the filters may be unrelated?
As zh:Special:防滥用过滤器/140, how can I do to improve efficiency and reduce condition?

乌拉跨氪 (talk) 11:59, 6 June 2013 (UTC)[reply]


Above given suggessions are good still, My openion/suggessions for above mentioned problem (My assumtions are general observations and may not be accurate,please correct me where I am wrong.)
  • My observation is, if an individual users's action trips a filter more than 5% of his(user's) total action then also a filter gets deactivated.(Some related notice is visible at times on individual filters).
My observation is this happens specially on filters where warning messages are served, so in those filters where serving of warning is not essential on first tripping itself,using rate limit throttle to postpone serving a warning and optimising (seconds) (or reducing time if its too long) in throttle feature might help.
  • I assume, if any filter is consumining more conditions, it will consume more run time also, so run time of indivisual filter may be a good indicator.So ,if, my routin visits obsrve high run time for a filter, usually I disable the filter temporarily till I study and improve and test parameters for run time optimisation and consumtion of conditions.
Above suggesstions are not specific to filter no 140 of zh-wiki since I could not see the filter being private. Please do correct me where I am wrong in above asumtion and observations.
I also feel that,a little more clarity in following may help,Querries coming to my mind:
  • Of the last 5,440 actions How do we know, what is the starting/begining point of counting these actions.
  • Whether the limit of 1000 condition includes (totals) condition consumtion of all the filters.
  • If not exact, what is the aprox and easy way/dependence, to understand condition counting
Mahitgar (talk) 15:14, 6 June 2013 (UTC)[reply]
Sorry, zh:Special:防滥用过滤器/21 is open. Could you give me some advice? Strangely, Special:AbuseFilter/520 and Special:AbuseFilter/473's condition is very high, can be thousands, but why it has not affected all the filters?乌拉跨氪 (talk) 18:30, 6 June 2013 (UTC)[reply]
Sorry, My assumption of relation between runtime and consumption of conditions seems to be wrong.I did not understand need of parameter "action==edit" in your filter 21, for the same I am testing on our(mr) wiki too,I will keep you updated.
I am also curious to know more about en wiki filter 473 and 520 as suggested by you.
Mahitgar (talk) 13:49, 7 June 2013 (UTC)[reply]

This is zhwiki filter #21:

action == "edit" & 
!("autoconfirmed" in user_groups) & 
!("bot" in user_groups)
& (article_namespace == 6)
& !(user_name in article_recent_contributors)
& (removed_lines rlike "\{\{.*\}\}")
& !(removed_lines in added_lines)

Firstly, I would remove the !("bot" in user_groups) check, since your bot accounts are probably going to be autoconfirmed due to mass editing. I'd also move the namespace check to the front, since that will filter out far more edits than the action == "edit" check. I can't help too much since I can't read Chinese and thus have no clue what this filter is supposed to be doing. This leaves us with the slightly more optimized filter:

(article_namespace == 6) &
!("confirmed" in user_groups) & 
(action == "edit") & 
!(user_name in article_recent_contributors) &
(removed_lines rlike "\{\{.*\}\}") &
!(removed_lines in added_lines)

Reaper Eternal (talk) 14:35, 7 June 2013 (UTC)[reply]

Thanks to all, this discussion is really helpfull and productive.Btw, I guess, filter zhwiki filter #21 seems to be based on or similler to March 2009 version of enwiki filter #59.
Mahitgar (talk) 04:43, 8 June 2013 (UTC)[reply]
Thanks for your patient answers. Zhwiki's filter problem is too severe, so that it can not be better by altering one or two filters. If anyone want to help us to alter zhwiki's filters, please tell me. 乌拉跨氪 (talk) 12:49, 9 June 2013 (UTC)[reply]

Engagement in Tool

Hello all; As a non-privileged user (who is not likely to become an abusefilter-manager) I was wondering if one of the more experienced managers could help me understand how to get engaged in the process. I'm currently wetting my feet with the syntax at test.Wikipedia, but want to eventually bring those skills where they matter (i.e, here). Fortunately for the project, the system as it stands is designed not to break things, but unfortunately for a newbie, all the contentious edits that do trigger a filter as vandalism, are often marked as private where I cannot interact with them. I'm looking for a way more than "Chat with us on the talk page mate!" to contribute, but am not requesting the flag. Look forward to hearing from you! Cheers! -TIM(Contact)/(Contribs) 11:51, 12 June 2013 (UTC)[reply]

I do manage filters for some other wiki and not for en-wiki.Still want to get into this discussion, sorry if you were not expecting me to join.


1)With experince from filter management, I have filed some bugs to seek more participation in open filters from the community.
Usually at warning we provide links to Home page of edit filter management.There one can refer open filters but has to search manually through list of filters.At bug 47494 I requested enhancement where larger community can see only open filters, simmiller purpose does have bug no. 45195 probably my bugs are not well understood by bugzilla tech community or may be I failed explain my points properly.
2)What you seem to be expecting is some thing more than point no 1. I would like to understand what you are expecting in >>I'm looking for a way more than "Chat with us on the talk page mate!" to contribute<< .Any specific ideas in your mind ,Please do elaborate.
Warm regards
Mahitgar (talk) 03:07, 14 June 2013 (UTC)[reply]

New edit filter suggeston

I think it would be useful to create a temporary filter to view the edits done by Visual editor. Particularly the ones done by new accounts. There are still some significant problems with it and if they release it to the 50% of new accounts today as they have been advertising it coudl cause a spike in the errors introduced to articles and formatting problems. Kumioko (talk) 16:59, 18 June 2013 (UTC)[reply]

So basically you want this? ;) Legoktm (talk) 19:44, 18 June 2013 (UTC)[reply]
Oh yeah thanks that's it. I learned 2 things today.:-) Kumioko (talk) 19:54, 18 June 2013 (UTC)[reply]

Exempting bots from filters

Would it be possible to exempt bots from Filter 167. We are having problems with archiving bots not being able to create new archives. Mdann52 (talk) 08:37, 24 June 2013 (UTC)[reply]

 Done. Legoktm (talk) 15:45, 24 June 2013 (UTC)[reply]

Problems with count, rcount and regex

I've been getting weird results with count, rcount and regex. I'll show tests below against this blocked edit - which is an edit containing the letter e multiple times.

Part 1: rcount can't count?

Count and rcount (in many examples that I've tested) evaluate to exactly 1 if there is a match, and to minus infinity (or at least a large negative number) if there is no match.

E.g. I test the simple filter:

rcount("e" in added_lines) == 1

This reports "The filter matched this change", when I expect it to not match. The count should be much higher than 1. ">1" or any other comparison I've tried fails to match.

Now I test for a string which is not in the added lines:

rcount("this is a test string blah blah blah" in added_lines) == 1

That matches, but it shouldn't. "0" or any other comparison I tried fails to match.

Also weird:

rcount("e" in "foo") 

...which matches, but shouldn't.

count gives similar results when I've tried it.

Part 2: regex:

Now to try regex for this string which is not found in added lines, testing against the same edit... at first it works correctly:

added_lines regex "this is a test string blah blah blah" 

"The filter did not match this change." Working correctly, no problem.

Using ! for NOT:

!added_lines regex "this is a test string blah blah blah" 

"The filter matched this change." Again, working correctly.

Then it gets weird

added_lines regex "this is a test string blah blah blah" == 0

This reports "The filter did not match this change.". Problem! I expect the first part of the expression to be false, and therefore the whole expression should match.

added_lines regex "this is a test string blah blah blah" < -10000000000000000000000000000000000000

This reports "The filter matched this change". False is somehow given a large negative number.

Can anyone help explain this to me? I've written a filter based on rcount, and on the idea that false is 0 and true is 1 - it's not working, and I came across these anomalies while trying to debug it. --Chriswaterguy talk 08:39, 1 July 2013 (UTC)[reply]

Well, for starters, the expression rcount("e" in added_lines) isn't giving the appropriate arguments to rcount. Rcount takes two arguments: the regex and the string with which to compare the regex. (Usage: rcount(string regex, string haystack).)
Secondly, you cannot perform a boolean operation in PHP (to the best of my knowledge) between a boolean and an integer as you can in C. (C does not have the boolean type; the boolean is simply an integer.) For example: ("asdasdasdasdasdffff" rlike "asdd") = 0 is not the same as ("asdasdasdasdasdffff" rlike "asdd") = false. The first expression, rlike, evaluates to 'false' in this case, and then a compare is done between 'false' and '0', which evaluates to 'false'.
Finally, "-10000000000000000000000000000000000000" is an integer underflow in even 64-bit systems.
I hope this helps. Reaper Eternal (talk) 12:27, 1 July 2013 (UTC)[reply]
Thanks - that's a huge help.
Is it reliable to use true and false as 1 and 0 in calculations? It seems to work (E.g. true+false==1 seems to evaluate true for any edit, based on initial testing.) Does using a boolean in a calculation automatically convert it to an integer? --Chriswaterguy talk 01:55, 3 July 2013 (UTC)[reply]
I did these tests:
  • added_lines regex "this is a test string blah blah blah" < -10000000000000000000000000000000000000 (matches - i.e. integer underflow)
  • added_lines regex "this is a test string blah blah blah" + 0 == 0 (matches - which is mathematically correct) and:
  • added_lines regex "this is a test string blah blah blah" * 1 == 0 (matches - which is mathematically correct)
So it looks like we can use a statement's true or false value as 1 or 0, respectively, in a calculation - and then use that in a comparison (<, > or ==). But if we don't do any mathematical operation, then as you pointed out, it's a different data type (boolean rather than integer or floating point) and we don't get a meaningful result.
Is that correct? Thanks. --Chriswaterguy talk 06:45, 4 July 2013 (UTC)[reply]

List of contributions

Hi folks,

How can I create a template that lists contributions by a specific user just on abuse filters?

We use hu:Template:Adminlista-elem to list special admin activities, such as log pages and editing MediaWiki namespace. I want to enhance it with abuse filter modifications. Bináris (talk) 07:43, 6 July 2013 (UTC)[reply]