User:FT2/AFM
- Filter writers manual for AbuseFilter
Sources:
- Wikipedia:Abuse filter
- Wikipedia talk:Abuse filter
- Wikipedia:Abuse filter/Instructions
- mw:Extension:AbuseFilter/RulesFormat
- google:Abusefilter
- https://bugzilla.wikimedia.org/show_bug.cgi?id=15684 - introduction on bugzilla
- Template:Abuse filter warning/doc - warning template info
- Wikipedia:Abuse filter/Sample2 - sample code
- Wikipedia:Abuse filter/Performance - evaluation and auditing
This page documents how AbuseFilter filters are written, the functions and expressions available, tips and tricks, and optimization and debugging.
- Note: The AbuseFilter extension is under ongoing development; this page may sometimes need bringing up to date by technically skilled and experienced users.
- Updates to AbuseFilter coding should be noted on /updates.
AbuseFilter
[edit]The AbuseFilter extension ("AF") is a powerful tool for anti-abuse purposes. It allows any user with access to specify a complex rule that will be tested on every edit, and if the criteria for that rule is met, then actions ranging from a log note, to warning and (potentially) removal of user rights or blocking, are automatically undertaken. Especially, AbuseFilter can intercept an edit before it is posted to the wiki, either warning or preventing it, asking users to check carefully, to be aware of some issue, or alerting them to the fact that anti-abuse action may be taken.
Care is needed since "false positives" and excessive server load are both major considerations. Knowledge of regular expression is useful in many cases, though by no means all.
Quick reference guide
[edit]Numbers and strings | |
"Hello world!" or 'hello world!' |
basic text strings |
123 123.76 -17.3 0 |
basic numbers |
\n \t \" \' \\ | to include line breaks, tab characters, quote marks and backslashes respectively |
Parser variables The following tokens will be replaced by their respective actual values | |
ACCOUNTNAME (explanation), ACTION (explanation), ADDED_LINES (explanation), ADDED_LINKS (explanation), ALL_LINKS (explanation), ARTICLE_ARTICLEID (explanation), ARTICLE_NAMESPACE (explanation), ARTICLE_PREFIXEDTEXT (explanation), ARTICLE_RECENT_CONTRIBUTORS (explanation), ARTICLE_RESTRICTIONS_EDIT (explanation), ARTICLE_RESTRICTIONS_MOVE (explanation), ARTICLE_TEXT (explanation), EDIT_DELTA (explanation), EDIT_DIFF (explanation), MINOR_EDIT (explanation), MOVED_FROM_ARTICLEID (explanation), MOVED_FROM_NAMESPACE (explanation), MOVED_FROM_PREFIXEDTEXT (explanation), MOVED_FROM_TEXT (explanation), MOVED_TO_ARTICLEID (explanation), MOVED_TO_NAMESPACE (explanation), MOVED_TO_PREFIXEDTEXT (explanation), MOVED_TO_TEXT (explanation), NEW_HTML (explanation), NEW_SIZE (explanation), NEW_TEXT (explanation), NEW_WIKITEXT (explanation), OLD_HTML (DISABLED, PERFORMANCE) (explanation), OLD_LINKS (explanation), OLD_SIZE (explanation), OLD_TEXT (DISABLED, PERFORMANCE) (explanation), OLD_WIKITEXT (explanation), REMOVED_LINES (explanation), REMOVED_LINKS (explanation), SUMMARY (explanation), TIMESTAMP (explanation), USER_AGE (explanation), USER_EDITCOUNT (explanation), USER_EMAILCONFIRM (explanation), USER_GROUPS (explanation), USER_NAME (explanation), ... (list incomplete) | |
Expressions: basic arithmetic | |
+ - * / ** % | The usual operators for addition, subtraction, multiplication, division, exponentiation ("to the power of") and modulo (sometimes known as "mod" or "remainder": 20 % 7 = 6) |
( ... ) | The usual parentheses for grouping expressions |
Expressions: logic and comparison | |
X & Y (&: item; item; ...; ) |
Logical AND – returns true if all the conditions are true. Note final ";" |
X | Y (|: item; item; ...; ) |
Logical OR – returns true if either of the conditions is true. Note final ";" |
X ^ Y | Logical XOR – returns true if one, but not the other, is true. |
! X | Logical NOT – returns true if the condition is not true. |
all comparisons are case sensitive for text. To compare without regard to casing (case insensitive), use lcase() on both sides: "lcase(X) != lcase(Y)" | |
X > Y X gt Y X < Y X lt Y |
greater than / less than (case sensitive for text) |
X >= Y X gte Y X <= Y X lte Y |
greater than or equals / less than or equals (case sensitive for text) |
X == Y X = Y X != Y |
equals / doesn't equal (== is the same as =) (case sensitive for text) |
X === Y X !== Y |
equals (type sensitive: 1 === 1 but 1 != "1") (case sensitive for text) |
X like Y | returns true if the left-hand operand matches the simple pattern in the right-hand operand (basic SQL pattern matching, allows * ? and possibly a few basic others) |
X rlike Y X regex Y |
returns true if the left-hand operand matches the regular expression pattern in the right-hand operand (full regex matching) |
X in Y | if the first string is in the second. |
Expressions: functions | |
length(TEXT) | returns the length of the string |
lcase(TEXT) | converts to lower case. |
ccnorm(TEXT) | converts similar/unicode characters to "normal characters" |
rmdoubles(TEXT) | removes repeated characters |
rmspecials(TEXT) | removes any "special" (generally, non-alphabetic) characters |
norm(TEXT) | does all three of these - converts similar/unicode to normal characters, and removes all repeated or special characters |
specialratio(TEXT) | returns the proportion of the text which is made up of non-alphanumeric characters |
count(TEXT) | returns the number of segments, counting each comma as a separate "segment". |
count(TEXT1,TEXT2) | returns the number of times the first text appears in the second text. |
Variables and settings | |
To complete | To complete |
(!: item; item; item...) X ? Y : Z (if X then Y else Z) rcount() ip_in_range substr strpos str_replace rcount(needle,haystack) rmwhitespace(text) ip_in_range(ip, range) contains_any(haystack,needle1,needle2,needle3) substr(subject, offset, length) strpos(haystack, needle) str_replace(subject, search, replace) set_var(var,value) := contains article_namespace article_prefixedtext rmwhitespace() X in user_groups contains_any(TEXT,ITEM1, ITEM2, ITEM3, ...) edit_delta old_wikitext added_lines removed_lines old_size accountname' => 'This variable is used only during account creation and contains the username of the newly created account', movedfrom-id' => 'Paraphrase: The page ID of the page to be moved', movedfrom-ns' => 'Paraphrase: Namespace of the page that is to be moved', movedfrom-text' => 'Paraphrase: Name of the page that is to be moved', movedfrom-prefixedtext' => 'Paraphrase: Full name of the page that is to be moved', movedto-id' => 'Paraphrased: Page ID of the destination of the page that is to be moved', movedto-ns' => 'Paraphrased: Namespace of the destination of the page that is to be moved', movedto-text' => 'Paraphrased: Name of the destination of the page that is to be moved', movedto-prefixedtext' => 'Paraphrased: Full name of the destination of the page that is to be moved', restrictions-edit' => 'This variable contains the level of protection required to edit the page. ("Edit" here is not a verb, but an adjective, like "Edit-related protection level")', restrictions-move' => 'This variable contains the level of protection required to move the page. ("Move" here is not a verb, but an adjective, like "Move-related protection level")', history' => '"Change history" is the "history of changes"', history-enabled' => '{{Identical|Enabled}}', history-timestamp' => 'Used in history page of a filter. {{Identical|Time}}', history-user' => '{{Identical/User}}', history-comments' => '{{Identical|Comment}}', history-deleted' => '{{Identical|Deleted}}', history-select-user' => '{{Identical|User}}', exception-expectednotfound' => "Errormessage from the abuse filter parser. Group by (for throttling): (For example, using 'ip,page', X filter matches in Y seconds from the same IP address to the same page will be required to trip the remainder of the actions.) ip — IP address. user — User account. range — /16 range. creationdate — Creation date, server time. editcount — Edit count — hack so that you can detect distinct users. site — The whole site. page — Page Actions: range-blocking - blocks a /16 tags appear on all RC, feeds, edits, contribs etc. warnings - see [[Template:Abuse filter warning/doc]]
- [check: any function for "integer" or "integer division"?]
- [ is (^: ...) or (**: ...) okay?]
- [ How does ";" work in logical and/or/etc? Is a final ";" before the closing ")" needed? What if an expression includes a ";" itself?
- Does if..then have an "end" in the statement?
- How do variables and expressions work?
- === == and = (=== is type sensitive, is this same as case sensitive? == and = are same. := is assignment)
- funcSetVar
- Variable typing? "list" variable type? "Null"?