Jump to content

User:SMcCandlish/Editfilters

From Wikipedia, the free encyclopedia

Sandbox for edit-filter drafting.

Draft edit filter to catch addition of questionable sources[edit]

This is written to be easily extensible to detect addition of links to (or mentions of) sites that are generally not reliable sources nor the subject of encyclopedic coverage themselves, but which could occasionally be validly used as WP:PRIMARY for certain things.

I don't have access to Special:AbuseFilter/tools ("For security reasons, only users with the right to modify edit filters may use this interface"), so it's difficult to be certain the syntax is perfect. E.g., I wasn't certain that the boolean | works on strings inside "...", but it apparently does, since it's used at Special:AbuseFilter/657 to match various templates by detecting {{ followed by one of various template names separated by |.

Code[edit]

article_namespace == 0 & (
	(
	added_lines contains "quickanddirtytips.com|QuickAndDirtyTips.com|QUICKANDDIRTYTIPS.COM" &
	!(removed_lines contains "quickanddirtytips.com|QuickAndDirtyTips.com|QUICKANDDIRTYTIPS.COM")
	)
	|
	(
	added_lines contains "messybeast.com|MessyBeast.com|MESSYBEAST.COM" &
	!(removed_lines contains "messybeast.com|MessyBeast.com|MESSYBEAST.COM")
	)
)

Notes[edit]

Likely to be more efficient by putting strings into an array and growing it at each instance, then doing a single set of added_lines and !(removed_lines ...) tests at the end, especially if this grew to be an extensive list of usually-not-reliable sources (e.g. various political blogs and tabloid news sites). This isn't being done at Special:AbuseFilter/126, but it only has two tests.

We could do a contains_any( "quickanddirtytips.com", "QuickAndDirtyTips.com", "QUICKANDDIRTYTIPS.COM" ), etc., and then a true/false test. Would reduce string redundancy at the cost of operation complexity. This would probably be necessary if the above efficiency measure were implemented.

Will not catch every possible case (e.g. "mEsSyBeAsT.CoM"), but regex searches are expensive, and just giving the three most likely formats (copy-paste, intentional effort to make it more readable with camelcase, and cluebag who doesn't know how CAPSLOCK works) is surely sufficient, and also keeps the entries very simple. We're not looking for vandals here but good-faith attempts at sourcing that is actually poor.

Namespaces test could also check for the template and portal namespaces to catch other instances of inserting links to questionable sites in reader-facing content.

If we really wanted to, could also test that the inclusion is within ref tags or ref templates, but there's no need to do so here; we generally shouldn't be adding links to these sites in "External links" or "Further reading", either, and the purpose of this filter is just to flag the edit for review, not to prevent it being saved or to deliver any kind of warning.

This does not do an edit count or user-group check because experienced editors are just about as likely to insert links to such sources as noobs; it's a judgment matter not an experience one.