Jump to content

User:FT2/AFM

From Wikipedia, the free encyclopedia
Filter writers manual for AbuseFilter

Sources:



This page documents how AbuseFilter filters are written, the functions and expressions available, tips and tricks, and optimization and debugging.

Note: The AbuseFilter extension is under ongoing development; this page may sometimes need bringing up to date by technically skilled and experienced users.
Updates to AbuseFilter coding should be noted on /updates.

AbuseFilter

[edit]

The AbuseFilter extension ("AF") is a powerful tool for anti-abuse purposes. It allows any user with access to specify a complex rule that will be tested on every edit, and if the criteria for that rule is met, then actions ranging from a log note, to warning and (potentially) removal of user rights or blocking, are automatically undertaken. Especially, AbuseFilter can intercept an edit before it is posted to the wiki, either warning or preventing it, asking users to check carefully, to be aware of some issue, or alerting them to the fact that anti-abuse action may be taken.

Care is needed since "false positives" and excessive server load are both major considerations. Knowledge of regular expression is useful in many cases, though by no means all.

Quick reference guide

[edit]
Numbers and strings
"Hello world!" or
'hello world!'
basic text strings
123
123.76

-17.3
0
basic numbers
\n \t \" \' \\ to include line breaks, tab characters, quote marks and backslashes respectively
Parser variables
The following tokens will be replaced by their respective actual values
ACCOUNTNAME (explanation), ACTION (explanation), ADDED_LINES (explanation), ADDED_LINKS (explanation), ALL_LINKS (explanation), ARTICLE_ARTICLEID (explanation), ARTICLE_NAMESPACE (explanation), ARTICLE_PREFIXEDTEXT (explanation), ARTICLE_RECENT_CONTRIBUTORS (explanation), ARTICLE_RESTRICTIONS_EDIT (explanation), ARTICLE_RESTRICTIONS_MOVE (explanation), ARTICLE_TEXT (explanation), EDIT_DELTA (explanation), EDIT_DIFF (explanation), MINOR_EDIT (explanation), MOVED_FROM_ARTICLEID (explanation), MOVED_FROM_NAMESPACE (explanation), MOVED_FROM_PREFIXEDTEXT (explanation), MOVED_FROM_TEXT (explanation), MOVED_TO_ARTICLEID (explanation), MOVED_TO_NAMESPACE (explanation), MOVED_TO_PREFIXEDTEXT (explanation), MOVED_TO_TEXT (explanation), NEW_HTML (explanation), NEW_SIZE (explanation), NEW_TEXT (explanation), NEW_WIKITEXT (explanation), OLD_HTML (DISABLED, PERFORMANCE) (explanation), OLD_LINKS (explanation), OLD_SIZE (explanation), OLD_TEXT (DISABLED, PERFORMANCE) (explanation), OLD_WIKITEXT (explanation), REMOVED_LINES (explanation), REMOVED_LINKS (explanation), SUMMARY (explanation), TIMESTAMP (explanation), USER_AGE (explanation), USER_EDITCOUNT (explanation), USER_EMAILCONFIRM (explanation), USER_GROUPS (explanation), USER_NAME (explanation), ... (list incomplete)
Expressions: basic arithmetic
+ - * / ** % The usual operators for addition, subtraction, multiplication, division, exponentiation ("to the power of") and modulo (sometimes known as "mod" or "remainder": 20 % 7 = 6)
( ... ) The usual parentheses for grouping expressions
Expressions: logic and comparison
X & Y
(&: item; item; ...; )
Logical AND – returns true if all the conditions are true. Note final ";"
X | Y
(|: item; item; ...; )
Logical OR – returns true if either of the conditions is true. Note final ";"
X ^ Y Logical XOR – returns true if one, but not the other, is true.
! X Logical NOT – returns true if the condition is not true.
  all comparisons are case sensitive for text. To compare without regard to casing (case insensitive), use lcase() on both sides: "lcase(X) != lcase(Y)"
X > Y
X gt Y
X < Y
X lt Y
greater than / less than
(case sensitive for text)
X >= Y
X gte Y
X <= Y
X lte Y
greater than or equals / less than or equals
(case sensitive for text)
X == Y
X = Y
X != Y
equals / doesn't equal
(== is the same as =)
(case sensitive for text)
X === Y
X !== Y
equals (type sensitive: 1 === 1 but 1 != "1")
(case sensitive for text)
X like Y returns true if the left-hand operand matches the simple pattern in the right-hand operand (basic SQL pattern matching, allows * ? and possibly a few basic others)
X rlike Y
X regex Y
returns true if the left-hand operand matches the regular expression pattern in the right-hand operand (full regex matching)
X in Y if the first string is in the second.
Expressions: functions
length(TEXT) returns the length of the string
lcase(TEXT) converts to lower case.
ccnorm(TEXT) converts similar/unicode characters to "normal characters"
rmdoubles(TEXT) removes repeated characters
rmspecials(TEXT) removes any "special" (generally, non-alphabetic) characters
norm(TEXT) does all three of these - converts similar/unicode to normal characters, and removes all repeated or special characters
specialratio(TEXT) returns the proportion of the text which is made up of non-alphanumeric characters
count(TEXT) returns the number of segments, counting each comma as a separate "segment".
count(TEXT1,TEXT2) returns the number of times the first text appears in the second text.
Variables and settings
To complete To complete
(!: item; item; item...)
X ? Y : Z
(if X then Y else Z)
rcount()
ip_in_range
substr
strpos
str_replace

rcount(needle,haystack)
rmwhitespace(text)
ip_in_range(ip, range)
contains_any(haystack,needle1,needle2,needle3)
substr(subject, offset, length)
strpos(haystack, needle)
str_replace(subject, search, replace)
set_var(var,value)
:=



contains
article_namespace 
article_prefixedtext
rmwhitespace() 
X in user_groups
contains_any(TEXT,ITEM1, ITEM2, ITEM3, ...)
edit_delta
old_wikitext
added_lines
removed_lines
old_size

accountname' => 'This variable is used only during account creation and contains the username of the newly created account',
movedfrom-id' => 'Paraphrase: The page ID of the page to be moved',
movedfrom-ns' => 'Paraphrase: Namespace of the page that is to be moved',
movedfrom-text' => 'Paraphrase: Name of the page that is to be moved',
movedfrom-prefixedtext' => 'Paraphrase: Full name of the page that is to be moved',
movedto-id' => 'Paraphrased: Page ID of the destination of the page that is to be moved',
movedto-ns' => 'Paraphrased: Namespace of the destination of the page that is to be moved',
movedto-text' => 'Paraphrased: Name of the destination of the page that is to be moved',
movedto-prefixedtext' => 'Paraphrased: Full name of the destination of the page that is to be moved',
restrictions-edit' => 'This variable contains the level of protection required to edit the page. ("Edit" here is not a verb, but an adjective, like "Edit-related protection level")',
restrictions-move' => 'This variable contains the level of protection required to move the page. ("Move" here is not a verb, but an adjective, like "Move-related protection level")',


history' => '"Change history" is the "history of changes"',
history-enabled' => '{{Identical|Enabled}}',
history-timestamp' => 'Used in history page of a filter.
{{Identical|Time}}',
history-user' => '{{Identical/User}}',
history-comments' => '{{Identical|Comment}}',
history-deleted' => '{{Identical|Deleted}}',
history-select-user' => '{{Identical|User}}',
exception-expectednotfound' => "Errormessage from the abuse filter parser.



Group by (for throttling):
(For example, using 'ip,page', X filter matches in Y seconds from the same IP address to the same page will be required to trip the remainder of the actions.)
ip — IP address.
user — User account.
range — /16 range.
creationdate — Creation date, server time.
editcount — Edit count — hack so that you can detect distinct users.
site — The whole site.
page — Page

Actions:
range-blocking - blocks a /16
tags appear on all RC, feeds, edits, contribs etc.
warnings - see [[Template:Abuse filter warning/doc]]
  • [check: any function for "integer" or "integer division"?]
  • [ is (^: ...) or (**: ...) okay?]
  • [ How does ";" work in logical and/or/etc? Is a final ";" before the closing ")" needed? What if an expression includes a ";" itself?
  • Does if..then have an "end" in the statement?
  • How do variables and expressions work?
  • === == and = (=== is type sensitive, is this same as case sensitive? == and = are same. := is assignment)
  • funcSetVar
  • Variable typing? "list" variable type? "Null"?

Overview of writing filters

[edit]