User talk:Beetstra

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Welcome to my talk page.

Please leave me a note by starting a new subject here
and please don't forget to sign your post

You may want to have a look at the subjects
in the header of this talkpage before starting a new subject.
The question you may have may already have been answered there

Dirk Beetstra        
I am the main operator of User:COIBot. If you feel that your name is wrongly on the COI reports list because of an unfortunate overlap between your username and a certain link or text, please ask for whitelisting by starting a new subject on my talkpage. For a better answer please include some specific 'diffs' of your edits (you can copy the link from the report page). If you want a quicker response, make your case at WT:WPSPAM or WP:COIN.
COIBot - Talk to COIBot - listings - Link reports - User reports - Page reports

I will respond to talk messages where they started, trying to keep discussions in one place (you may want to watch this page for some time after adding a question). Otherwise I will clearly state where the discussion will be moved/copied to. Though, with the large number of pages I am watching, it may be wise to contact me here as well if you need a swift response. If I forget to answer, poke me.

I preserve the right not to answer to non-civil remarks, or subjects which are covered in this talk-header.


There are several discussions about my link removal here, and in my archives. If you want to contact me about my view of this policy, please read and understand WP:NOT, WP:EL, WP:SPAM and WP:A, and read the discussions on my talkpage or in my archives first.

My view in a nutshell:
External links are not meant to tunnel people away from the wikipedia.

Hence, I will remove external links on pages where I think they do not add to the page (per WP:NOT#REPOSITORY and WP:EL), or when they are added in a way that wikipedia defines as spam (understand that wikipedia defines spam as: '... wide-scale external link spamming ...', even if the link is appropriate; also read this). This may mean that I remove links, while similar links are already there or which are there already for a long time. Still, the question is not whether your link should be there, the question may be whether those other links should be there (again, see the wording of the policies and guidelines).

Please consider the alternatives before re-adding the link:

  • If the link contains information, use the information to add content to the article, and use the link as a reference (content is not 'see here for more information').
  • Add an appropriate linkfarm like {{dmoz}} (you can consider to remove other links covered in the dmoz).
  • Incorporate the information into one of the sister projects.
  • Add the link to other mediawiki projects aimed at advertiseing (see e.g. this)

If the linkspam of a certain link perseveres, I will not hesitate to report it to the wikiproject spam for blacklisting (even if the link would be appropriate for wikipedia). It may be wise to consider the alternatives before things get to that point.

The answer in a nutshell
Please consider if the link you want to add complies with the policies and guidelines.

If you have other questions, or still have questions on my view of the external link policy, disagree with me, or think I made a mistake in removing a link you added, please poke me by starting a new subject on my talk-page. If you absolutely want an answer, you can try to poke the people at WT:EL or WT:WPSPAM on your specific case. Also, regarding link, I can be contacted on IRC, channel [1].

Reliable sources

I convert inline URL's into references and convert referencing styles to a consistent format. My preferred style is the style provided by cite.php (<ref> and <references/>). When other mechanisms are mainly (but not consistently) used (e.g. {{ref}}/{{note}}/{{cite}}-templates) I will assess whether referencing would benefit from the cite.php-style. Feel free to revert these edits when I am wrong.

Converting inline URLs in references may result in data being retrieved from unreliable sources. In these cases, the link may have been removed, and replaced by a {{cn}}. If you feel that the page should be used as a reference (complying with wp:rs!!), please discuss that on the talkpage of the page, or poke me by starting a new subject on my talk-page

Note: I am working with some other developers on mediawiki to expand the possibilities of cite.php, our attempts can be followed here and here. If you like these features and want them enabled, please vote for these bugs.


I am in general against deletion, except when the page really gives misinformation, is clear spam or copyvio. Otherwise, these pages may need to be expanded or rewritten. For very short articles there are the different {{stub}} marks, which clearly state that the article is to be expanded. For articles that do not state why they are notable, I will add either {{importance}} or {{notability}}. In my view there is a distinct difference between these two templates, while articles carrying one of these templates may not be notable, the first template does say the article is probably notable enough, but the contents does not state that (yet). The latter provides a clear concern that the article is not notable, and should probably be {{prod}}ed or {{AfD}}ed. Removing importance-tags does not take away the backlog, it only hides from attention, deleting pages does not make the database smaller. If you contest the notability/importance of an article, please consider adding an {{expert-subject}} tag, or raise the subject on an appropriate wikiproject. Remember, there are many, many pages on the wikipedia, many need attention, so maybe we have to live with a backlog.

Having said this, I generally delete the {{expand}}-template on sight. The template is in most cases superfluous, expansion is intrinsic to the wikipedia (for stubs, expansion is already mentioned in that template).

Vandalproof.pngWarning to Vandals: This user is armed with VandalProof.
Warning to Spammers: This user is armed with Spamda
Choco chip cookie.jpg This user knows where IRC hides the cookies, and knows how to feed them to AntiSpamBot.


This talk page is automatically archived by MiszaBot III. Timestamped threads older than 7 days are automatically archived to the current archive

Talk started 20/3/2006
1 - 7/9/2006
2 - 29/11/2006
3 - 05/02/2007
4 - 05/03/2007
5 - 15/03/2007
6 - 29/07/2007
7 - 06/11/2007
8 - 31/03/2008
9 - 22/09/2008
10 - 03/02/2009
11 - 17/05/2009
12 - 13/11/2009
13 - 27/5/2010
14 - 13/12/2010
15 - 5/7/2011
16 - current
17 -
18 -
19 -
20 -
Wikipedia-logo-v2-en.svgThis user is one of the 400 most active English Wikipedians of all time.
AWAY This user is non-permanently away from Wikipedia as of December 19, 2011. This is because {{{2}}}

ChEMBL Multiple IDs

Hi Dirk, could you please program the DrugBox to accept 2 ChEMBL IDs so that I can add parents and salts? The ChemBox works like this already and it's excellent. Thanks in advance, Louisa Louisajb (talk) 11:57, 9 December 2011 (UTC)

I am sorry, I really do not have time at the moment. Would you mind posting on Template talk:Drugbox, maybe one of the other regulars of the drugbox is willing to help. Have a good time, and I expect that it will be next year before we see again (I am really busy). Thanks! --Dirk Beetstra T C 11:52, 19 December 2011 (UTC)

Nomination for deletion of Template:Chembox subst explosive

Ambox warning pn.svgTemplate:Chembox subst explosive has been nominated for deletion. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Bulwersator (talk) 14:46, 9 December 2011 (UTC)

I see this has been solved. All those templates are for some reason miscategorised (and I must confess, I have no clue if and how many people use this template ..). Thanks! --Dirk Beetstra T C 11:53, 19 December 2011 (UTC)

WP:UWTEST update

Hi Beetstra,

We're currently busy designing some new tests, and we need your feedback/input!

  1. ImageTaggingBot - a bot that warns users who upload images but don't provide adequate source or license information (drafts here)
  2. CorenSearchBot - a bot that warns users who copy-paste text from external websites or other Wikipedia articles (drafts here)

We also have a proposal to test new "accepted," "declined," and "on-hold" templates at Articles for Creation (drafts here). The discussion isn't closed yet, so please weigh in if you're interested.

Thanks for your help! Maryana (WMF) (talk) 02:07, 15 December 2011 (UTC)

Sorry, I have no time, though I would like to interact about this. I hope I have more time again next year. Thanks for the notice, see you around! --Dirk Beetstra T C 11:54, 19 December 2011 (UTC)

Ending XLinkBot test

Hey Beetstra, just a heads up that Maryana and I planned on ending the XLinkBot template test in a day or two, since it was started on the 17th of last month. I just wanted to double check: is this a revision you wanted to keep regardless of the template contents? Hope you're well, Steven Walling (WMF) • talk 19:55, 15 December 2011 (UTC)

Yep, that needs to stay (or re-added, which is maybe easier) - those parameters tell m:User:LiWa3 and User:XLinkBot where the revertlists are, the bots accept multiple, and some trusted non-admin users (those who hence do not have access to the main list), and whom I trust enough to use the bots have access to private lists via these settings.
I am away for a lot of time, and may not have time to go onto Wikipedia until somewhere in the beginning of next year (and even then). I will have to leave bot-operation to Versageek (who has access to all my bots), maybe further questions can be redirected to Versageek (but please keep me posted here, as I'd like to know)? Thanks! --Dirk Beetstra T C 11:58, 19 December 2011 (UTC)
Okay, rollback of the setting (plus your recent edit to the revertlist) is  Done. We'll keep you and Versageek updated on analysis work, since we're starting a pretty intensive round of it. Thanks for everything Beetstra, and happy holidays. Steven Walling (WMF) • talk 18:42, 19 December 2011 (UTC)

ChemBox data enrichment

Hi again Beetstra, I posted previously about a 670 drug database with human PK data. Probably I inserted it wrong, sorry about that.

Where you replied that it was a problem to read in the CAS identifiers, since you use InChi. I managed to convert the list to InChi. I can forward it to you... BUT how? Wouldnt wanna litter your talk page.

Just for completeness sake the list is found here

Warm greetings ./Claus — Preceding unsigned comment added by (talk) 15:47, 17 December 2011 (UTC)

I may not have access to do this work again until somewhere in the beginning of next year. I will try to remember and have a look then. Thanks! --Dirk Beetstra T C 11:59, 19 December 2011 (UTC)

Psilocybin CAS number

Hi Dirk. I hope you are enjoying the holidays. When you get the chance, would you mind taking a look at psilocybin and see why CheMoBot is not verifying the CAS number even though I listed it at User:Edgar181/non-commonchemistry-sourced CASNo. User:Sasata is trying to get the article to FAC and is taking a close look at everything there. Thank you. -- Ed (Edgar181) 02:37, 28 December 2011 (UTC)

I did not take it over yet, and CheMoBot does not read that list. All what CheMoBot does goes through the index (Wikipedia:WikiProject Chemicals/Index/Wikipedia:WikiProject Pharmacology/Index). I'd suggest that all the relevant identifiers are properly checked, and that a revid of the page where all those are either correct or blank (blank for those for which do not exist or which are not verifiable to one correct value) is added to the correct index. I hope this explans. --Dirk Beetstra T C 17:35, 8 January 2012 (UTC)
I think I understand. I'll see what I can do. Thanks. -- Ed (Edgar181) 14:05, 9 January 2012 (UTC)

Hi there We we blacklisted. We try to add an external link to A History of Surfing in Piha to wiki page about Piha. We live in Piha and we are surfers. Can you please allow our link? It not a commercial page. Regards, Phil. — Preceding unsigned comment added by (talk) 10:06, 3 January 2012 (UTC)

Beetstra is on a wikibreak so I will provide some suggestions.
This relates to Piha. Please review WP:EL and consider whether the external link you propose would actually assist readers of the article. If you have good reasons for why the link would be helpful, please post a request at WT:WHITELIST.
Are you sure the link is blacklisted? Or was it simply reverted by other editors? Johnuniq (talk) 11:34, 3 January 2012 (UTC)
As far as I can tell, the link has not (yet) been blacklisted. I suggest you discuss placing the link before adding it to any article on any language. I think this link does not comply with the guidelines for external links and should not be in wikipedia. EdBever (talk) 19:31, 3 January 2012 (UTC)
I've started a discussion of the link at Talk:Piha. Please contribute there.-gadfium 20:03, 3 January 2012 (UTC)

I answered there, IMHO, the link does not belong there per WP:NOT#REPOSITORY and WP:NOT#DIRECTORY and the intro of WP:EL. --Dirk Beetstra T C 17:33, 8 January 2012 (UTC)

Publications Office of the European Union / Publications Office pages

Dear Mr Beestra,

I am the social media editor for the Publications Office. We understand that Facebook has generated pages from Wikipedia called 'Publications Office of the European Union' and 'Publications Office'.

Our first concern is that we have created a page on Facebook 'EU Law and Publications'in which we feed our latest news. But we think that people are being confused with the automatically generated pages. I would therefore like to redirect people to our bona fide Facebook page. This is why I have tried to change the pages.

Can you help me please? — Preceding unsigned comment added by (talk) 11:30, 12 January 2012 (UTC)

Thank you for your question. That facebook page fails our inclusion standards, and that is why I have removed them all (and also XLinkBot is trying to tell you that). We do not have to include all official pages or everything that is related to the subject. Please read the external links guideline and 'What Wikipedia is not'. Thank you. --Dirk Beetstra T C 14:45, 12 January 2012 (UTC)

Chembox property count updated

See User:Itub/Chembox property count. Sorry it took so long to reply! --Itub (talk) 21:58, 13 January 2012 (UTC)

Thanks Itub. Good to see that the number of parameters is quickly increasing. I also see many 'broken' parameters. I'll try to work on that one of these days. --Dirk Beetstra T C 04:06, 17 January 2012 (UTC)

A barnstar for you!

Vitruvian Barnstar Hires.png The Technical Barnstar
You did the migration of COIBot, and as you must be beer free, please have this very expensive award and the beer will have to be taken on notice. — billinghurst sDrewth 03:40, 17 January 2012 (UTC)
Thanks! I'll wait for better beer times in stead of having one of the alcohol free beers they sell here in the malls. I don't want to share the faith of the Buckler drinkers in the Netherlands. --Dirk Beetstra T C 04:05, 17 January 2012 (UTC)
Oh, I was thinking more a tasty drink. — billinghurst sDrewth 06:29, 19 January 2012 (UTC)
Oh, that for sure, billinghurst. But the main beer I saw here in the supermarkets (only visited one huge one so far) is, low and behold, alcohol free Budweiser (imagine being in dry country, and seeing from the other side of the shop a stand with bottles of which you can only make up the word 'Budweiser' from that distance). They do have a couple of other brands of beer (many are of the fruity type, like the Belgian Kriek), but it is quite minimal (and of course all is alcohol free). I never tried James Squire, I am looking forward to it! --Dirk Beetstra T C 07:23, 19 January 2012 (UTC)

CoiBot question

Hi Dirk. I've left a comment at Wikipedia:WikiProject Spam/Local/ I was asked to look into it by new user User:Msruzicka, who was concerned that he was in trouble. I had actually told him to add BCGNIS info to some of his many BC geo stubs to flesh them out, some I'm partially responsible. I haven't worked with one of these reports before, so maybe my comment is improperly placed. If you have time, maybe you could give it a look. Thanks, The Interior (Talk) 06:00, 19 January 2012 (UTC)

All is sweet. It is monitoring the link, not reporting it as spam. The text in the lead box is relevant to set the scene, and that the relevant site is not entered in any of the Links list is the relevant concern. As it has been reviewed (by me), I have closed it as such. If another batch of links are added, then it will probably open again, and in which case we can close it again. No issue, it is just doing its job of monitoring. — billinghurst sDrewth 06:26, 19 January 2012 (UTC)
Thanks, billinghurst! The Interior (Talk) 07:05, 19 January 2012 (UTC)

(ec) Thanks, The Interior and billinghurst. I will expand a bit on this.

Indeed, the bot is doing its job monitoring. The location of these reports is with the WikiProject Spam, as they are the main ones monitoring these reports. The name of the project (and the WikiMedia feature 'MediaWiki:Spam-blacklist') is a bit misleading. Quite a bit of the material handled by the project and quite a number of the entries on that list are not 'spam' even under a wide definition of that term.

We run a number of programs which monitor all link-additions by all editors to Wikipedia (actually, 772 wikimedia projects), with as a main goal to catch spam or other inappropriate stuff. All those link additions are stored in a big database. Now, real spam (on Wikipedia widely defined as 'links added for promotion', which includes what the general public thinks of as spam, like porn, viagra etc., but it actually also includes regular companies optimizing their search results (Search Engine Optimization, SEO), public organisations who want to make their (generally good) cause known to the public to raise more money and politicians wanting to increase people finding their pages so they would gain more votes) generally has as a common feature that they are 'new links added by a new user' (those links are never used before, so they do not appear in the database, and those editors often did not edit before). So if we find a new user focusing on one link then that is reason for concern. The bot notifies the Wikipedia editors by automatically creating a linkreport. (the catching system is stronger than that, but I will of course not disclose all the exact features per WP:BEANS)

A good percentage of those links are added in good faith (although that does not make it right all the time) and can be ignored (or in some cases, just reverted and then ignored), the rest is real spam and further action can be taken then. As billinghurst already noticed, these links are good, and in a way, we just ignore the link additions. I reporting persists, I would suggest to whitelist it hard (which needs to be done off wiki) or at least set the status of the report to 'ignore'. Note that the reports are not indexed by search engines (that follow that directive, the major ones do), they can not be found on internet, you have to specifically go to Wikipedia to find them.

I hope this explains. --Dirk Beetstra T C 07:17, 19 January 2012 (UTC)

It certainly does, appreciate it Dirk. As the majority of appropriate pages have the link already, it probably won't come up again. Now that I've read the primer for fighting spam, let me know if you guys need help with any backlogs, etc. Best, The Interior (Talk) 00:38, 20 January 2012 (UTC)

Update: new user warning test results available

Hi WP:UWTEST member, we wanted to share a quick update on the status of the project. Here's the skinny:

  1. We're happy to say we have a new round of testing results available! Since there are tests on several Wikipedias, we're collecting all results at the project page on Meta. We've also now got some help from Wikimedia Foundation data analyst Ryan Faulkner, and should have more test results in the coming weeks.
  2. Last but not least, check out the four tests currently running at the documentation page.

Thanks for your interest, and don't hesitate to drop by the talk page if you have a suggestion or question. Maryana (WMF) (talk) 19:23, 20 January 2012 (UTC)

Thanks, that is interesting. I'll have a look. In case I forget, one thing I was pondering the last couple of weeks sometimes about this test (not sure how to put this into words): I had the feeling that the reasoning for complaining to XLinkBot or reverting was different during the test - the type of complaints felt different. I know that the current warnings are long, but I have the feeling that the complainers show more understanding now than during the test (you have to separate the complaints about the mode of operation of the bot from the complaints about the links itself). Note that there are (outside of the testing) editors who come to XLinkBot with 'I've read WP:EL, and I think you are right, the link was not suitable, I have adapted the edit'. I don't know if there is enough data recorded to really get hard data about that. It may be reflected in the number of re-additions after the bot removes - I am afraid that the test-templates did not let the editors understand why the link was removed, and that they therefore blindly revert the bot (and if they complain, they give me the other feeling about the complaints, showing that they did not understand why they were reverted by the bot in the first place). --Dirk Beetstra T C 19:56, 20 January 2012 (UTC)
I think your experience with that is correct. This is likely a case of language that works really well with one kind of new editor (such as test edits, like adding [[File:Example.jpg]]) does not work well with a different kind of editor. It sounds entirely reasonable to me that people adding really wrong external links need more education.
I do want to say though, that in combing through the test data and in my personal editing, I sometimes see someone who made other valuable text additions get reverted because one part of their edit was to add an inappropriate link. Those editors are usually confused and think it means all their contributions were bad, not just the link. Perhaps one way forward is to keep the current non-test versions of the templates, but make XLinkBot much more cautious about reverting someone who added several bytes of text along with a bad link -- maybe it could just log those edits and warn the editor? Steven Walling (WMF) • talk 21:13, 20 January 2012 (UTC)
I don't think that would be desirable. Wikipedia is a sitting duck for external link promotions (covered by XLinkBot) and vandalism (covered by ClueBot), and IMHO the most important remedy is speed of response. People adding external links count any exposure as a win—if an external link is visible for a few hours before a human removes it, the person promoting the external link will often think their time was well spent and they got a good result. What discourages external link spammers and vandals is when a bot reverts their edits within minutes. Of course the situation is not clear, and XLinkBot reverts good-faith (although possibly naive) editors as well as others of less good faith. There is no good solution, and the current system may discourage a small number of editors who start by adding several external links. However, many people whose primary aim is to add external links and who manage to maintain their links for an extended time, then develop a sense of entitlement and require extensive volunteer effort when experienced editors try to remove external links per WP:EL. There would need to be evidence of a real problem (the loss of potentially good editors) to warrant changing the behavior of the bot, and one important issue is that someone who cannot read and understand a reasonable message is not showing promise. Johnuniq (talk) 23:19, 20 January 2012 (UTC)
Johnuniq: I should say that I did a quick bit of qualitative coding on the users who were warned (you can see it for yourself in the XLinkBot data spreadsheet here), and the majority (79 out of 100) were not spammers – they were simply linking to YouTube, Facebook, Twitter, or Wordpress. I'd be happy to do some more coding on this sample, and you're welcome to, as well (just ask and I'll give you editing privileges), but I'm guessing that it's pretty representative of who XLinkBot is hitting. So while serial spammers who crave exposure do exist, they're actually pretty rare. Maryana (WMF) (talk) 00:07, 21 January 2012 (UTC)

──────────I took the first 12 cases that were not shown as "spam" from the above mentioned spreadsheet, and checked the edits, with these results:

There is nothing in the above results to suggest that changing XLinkBot would be helpful. It is good that the WMF wants to encourage editing, but there appears to be a belief that the community has an inexhaustible supply of good editors who are available and willing to dedicate an hour to explaining the purpose of Wikipedia to new arrivals. Judging from the above, the bot has saved a great deal of time and trouble. If there is a problem, one approach may be to encourage a group of editors to follow the bot's work, and to revert the bot (and remove its warnings) where appropriate, and to manually engage with the new editor. Johnuniq (talk) 03:19, 21 January 2012 (UTC)

I see you make the crude separation between youtube/facebook/myspace/twitter etc. and 'spam'. There are a couple of major concerns there which one needs to take into account.

  • A part of youtube/facebook/myspace/twitter etc. are spam nonetheless. They are added for a promotional target. Above, those who create/change articles to bio's of musicians with facebook links are not naively adding facebook links, quite some of them are here to promote.
  • - I did a quick scan of 10 reverts a couple of months ago. Two of those (20%!) were links to video clips of songs of artists which were still within copyright, and not obviously uploaded by the owner of those rights - in other words, very likely they were copyright violations. See the blackout of a couple of days ago, I think they are an even bigger concern than the spam part. Do note that YouTube is not the only site that hosts copyvios, that is also true for blogspot e.g. (a long time ago I had an admin complaining to me that I reverted a blogspot reference, he did not see anything wrong with it - it was however a straight copy of a newspaper article where the article was behind a (albeit free) login, the article belongs to the newspaper, the blogspot was a copyvio of it).
  • Also note that wordpress/blogspot and other blog-like sites are also often added promotional by the writer of the piece. It may sometimes be helpful, nonetheless it qualifies to something close to spam.

I think that the bot explains now properly that if the edit contained more than only an offending link, that the editor should undo with the consideration to remove the external link before saving the page again. Also that was something that was not said in the test-warnings, and editors may therefore have been more inclined to revert the whole edit without consideration ...

I'll answer later more, when I have more time. --Dirk Beetstra T C 04:17, 21 January 2012 (UTC)

Following the conversation a little from the outside, and more focusing comment from the external view of xwiki abuse rather than internal abuse, and not had the time to do any analysis on the data or the heuristics of Xlinkbot; though I will qualify that statement with I have done a lot of cleanup work recently in m:Category:Open Local reports for Help:Citing sources pretty clearly considers blogs and many of the mentioned as less authority as references. So that gives scope to the sort of message that we can deliver, reverted or warned. There is the issue of external links, versus <ref> links, and the placement of the link. To my anecdotal feedback, the higher it is placed in an existing list, the more problematic. Similarly, I see numbers of issues where people add the section External links and add a link, or the only pre-existing link is a template and the following link is promotional. Clearly the less text and more link that is added is an indicator of an issue, similarly lack of addition of refs, though both can also reflect newbieness and nervousness/tentativeness in an edit. It is the balance between adding quality references, and exceptional external links to build a quality encyclopaedia, and finding the the nice way to do it when it needs to be done in circumstances of good faith. — billinghurst sDrewth 05:33, 21 January 2012 (UTC)
Addendum. From the results available it is not evident to me that I can easily see where the focus is on the message provided or the type of reverted edit. Plus from the xwiki work, I am less likely to be be seeing xlinkbot in action, but more what it didn't catch, which may skew things in a positive or a negative direction. — billinghurst sDrewth 05:39, 21 January 2012 (UTC)

Just to expand on the point by Johnuniq regarding separating 'spam' from 'youtube/blogs/etc', and in reply to Maryana (WMF):

Do take care. You say 'the majority (79 out of 100) were not spammers – they were simply linking to YouTube, Facebook, Twitter, or Wordpress' .. two of these sites here now were clearly spammed (both reported by XLinkBot to be blocked for spamming, both blocked as a result of that). --Dirk Beetstra T C 19:22, 22 January 2012 (UTC)

Johnuniq's original point, if I understand it correctly, was that people who insert inappropriate external links into articles are mostly doing so to gain revenue from directing clicks and eyeballs to their own sites. My response was that YouTube, Flickr, and Facebook/Twitter linkers, who make up a significant chunk of people warned by XLinkBot, are qualitatively different from spammers who insert links to obscure commercial or promotional websites. They're not linking to copyvio videos or fan pages because they want ad revenue – they're linking to them because they don't understand the rules of Wikipedia and think their contribution is acceptable. If they don't get the message after repeated warnings and blocks, it's not because they're hopelessly stupid; it's because we're not doing a good enough job of explaining the rules to them in a clear, non-hostile way. I understand that it gets frustrating having to explain the same thing to people day in and day out, but you have to remember that what feels so strikingly obvious to you, an experienced Wikipedian, is still incredibly strange and incomprehensible to new users.
As for blogs, for every case of self-promotion, there are cases like this, where someone tries in good faith to add a reference he thinks is appropriate, and the only help or feedback he receives comes in the form of a template warning from a bot. That's why I'm saying we should pay more attention to these messages, and why we should continue to tweak them until they're at least as helpful as a human mentor. I do understand that Wikipedia doesn't have an army of people to patiently guide newbies through stuff like this. That's exactly why I think it's so important to get template messages right. But it's a chicken-and-egg dilemma. Part of the reason we don't have that army is that our editing numbers are dwindling by the day, and part of the reason for that is that we don't do enough to help and guide new users. Maryana (WMF) (talk) 19:04, 23 January 2012 (UTC)

I agree, Maryana, Wikipedia there has a problem, retaining editors, gaining new editors. Notifying people appropriately and properly is there necessary. That does not necessarily lie in a short message without flooding them with policies and guidelines, it lies in bringing a nice and friendly message notifying the people appropriately of the point you want to get through. A too short a message, however friendly, does not get the message through, and people will react more with 'go away bot, I think my link is good', they revert, and don't get to know why some parts are a problem.

I understand that you meant, but the qualification that editors who add twitters, youtubes and the like are mainly doing that not for gaining revenue but because of ignorance of the rules is too short. The obscure website owners are indeed the ones who do it for revenue (not necessarily obscure websites, we've had large organisations of continental or global importance spamming Wikipedia for revenue), but a significant part of the twitters, youtubes and myspaces are not added because of ignorance, they are added solely for promotion.

And that is where we hit the problem, I think. Many people regard spam as a form of vandalism, the editor adding the inappropriate links for any form of promotional effort is just an editor who is ignorant of Wikipedia's rules. Sure, some of the editors who add a large number of inappropriate links do it with the idea that they think they are helping Wikipedia, but the true spammers are not here for that reason. And that separates spammers from the good faith editor who is ignorant of our rules. Most of the editors out of the second group only find one, very maybe two, warnings by XLinkBot. That initial message there is aimed at being friendly, and gaining understanding. Only very rarely I encounter good faith editors who get to a {{uw-spam4}} (I recall out of the last year two cases, one genuine editor whose editing was a bit clumsy and who managed to get to WP:AIV (I think they first tried facebook, that was rejected, so they left out facebook and tried myspace .. was rejected as well, then moving on for some edits and adding the facebook again .. rejected), the other editor ran into a block for WP:POINT-violations (user logged out to show that IP editors can do good stuff, got blocked after XLinkBot reported them to AIV, unblocked, and then investigated resulting in a block for intentional disruption of the system).

The small group of good faith editors who add a large number of links with which they think they improve Wikipedia generally get the message after being talked to. Those links in the beginning do not end up on XLinkBot, but such editors are being talked to by regulars first. Most of them understand, only every now and then someone panics and needs more significant pushes in the right direction (sometimes they do end up on XLinkBot, especially if they have a wider range of IPS, or if they, in their panic, start socking - it makes clean up and tracking easier, and quick remarks may at a certain point result in them getting the point that we are trying to communicate with them).

But that first group, the true spammers, are, as I said, not here to improve Wikipedia. They are here to improve their own financial situation, or that of a cause they represent, or the financial situation of a company they represent (and there are even 'far fetched' scenarios which nonetheless are happening as well - webmasters who add links to Wikipedia to get more incoming customers to show the resulting web-traffic to their supervisors so they can have a raise, or have better web servers). Do not mistake them for misled regular editors - they are here to make money, they are here to use the free webspace of Wikipedia to improve their situation. They will not stop unless hard forced methods are placed (that starts with XLinkBot, which a lot of spammers simply disregard, but it does have an effect, some do go into discussion and/or stop). The majority of real spammers do everything to stay under the radar. They create numerous socks, they use redirect sites. The only way to stop these are the hard measures of the Spam blacklists and blocking editors. They do not care about your friendly messages (only a strong warning that their site(s) may end up blacklisted may make them consider to back off), they will not be converted. When they can, physically, not spam Wikipedia anymore because their links are blacklisted, the only edits they may do is complain, or try to circumvent the blacklisting. They will not turn into valuable editors - most will simply disappear (but they keep an eye on it .. I have seen cases where spammers got their links blacklisted, and months, years later the links get de-blacklisted and the spammer returns very soon after, it pays to have your links here - the same as schoolkid-vandalism, you see vandalism restart hours or days after the block expires).

Now, the point is going to be, that we need to keep the valuable editors, be friendly enough to them to keep them, and to educate them why certain links are a problem. I strongly believe that XLinkBot has there a task in educating them what may be the problem and asking them to take care next time. Just saying 'I reverted your link, bye' is not enough, there education is a necessity, and also telling them what they can do next. That message may be long (risking tl;dr), but not trying to make them understand the rules is certainly less effective. If they revert the bot not knowing what the problem is, they may (and probably will) be re-reverted by a human editor who will, at the least, ask them why they did not consider the first message, the answer may be 'the first message did not tell me what I did wrong'. It is like a traffic cop who sees a first time offender driving 120 where 80 is allowed, stopping the driver and saying 'I warn you, bye', getting back on his motor bike and driving on, leaving the driver to think 'what did I do wrong' ... In stead, most traffic cops will get off from their motor bike, not even intending to write a fine but with the intention of giving a strong warning, immediately pointing the driver to what they did wrong, and explaining the consequences if it happens again. And I think only very few drivers will get out, throw the keys into the grass and walk home, never to drive again (and they even don't do that when that cop is in a bad mood and does write the fine on the first time).

Currently, our message is aimed at being friendly and giving education, but also at being firm. It is very difficult to guess how many editors walk away after the first remark, but I think that shifting the message (any message) is just going to shift the group. If you give a full explanation, editors will leave because they get reverted and don't want to read it all, if you give a very short message editors will leave because they get reverted and don't get told why. In the end, it is the same percentage that walks away.

Maybe we should have a look at the genuine logged-in new editors, see how many keep editing (IP editors may return as another IP, so it is difficult to get a proper number out of that). Of the editors that are still active (also those behind static IPs), you could ask how they perceived the message that the bot left them - whether they understood it, how they found the tone, etc. etc.

Hope to hear more. --Dirk Beetstra T C 11:59, 24 January 2012 (UTC)

Thank you for your thoughtful response, Dirk :) I just want to make it clear that when Steven and I approached you (or any other bot op) for a template test, it's not because we thought your templates were bad or wrong. We know you didn't just slap something together in two seconds – you actually put a lot of time and thought into making the message appropriate and helpful (I know this because I had to spend a whole day teasing out all the different parameters and regexes from the config page!). But as you say, it's an incredibly complicated situation: how do you encourage the good editors without overwhelming them with rules and policy, while simultaneously keeping spam out of the encyclopedia? That's just not a problem any one of us could possibly figure out through educated guessing or anecdotal evidence. A/B testing different messages might sound really crude, but over a long enough timeline and with a large enough sample, we actually can say, with statistical confidence and significance, that one template did better than another at retaining users (or getting them to talk to other editors, or getting them to use the help desk...). We have a team of PhD students (User:EpochFail and User:Staeiou) and a data analyst (who helped us raise 20 million dollars for the fundraiser, all through A/B testing!) who can run all the fancy regressions and Chi-squared tests you see in peer-reviewed journals. You can certainly doubt our ability to discern the secret motives that lie deep within the hearts of editors, but I should hope you don't have the same doubts about science and statistics :)
I guess what I'm saying is, we don't have to rely on educated guessing anymore – we have the resources and the staff (thanks in large part to the aforementioned fundraiser, heh) to get scientific evidence about template effectiveness. If the templates Steven and I wrote totally bombed, then it'll be clear that you were on the right track with more explicit explanations for reverts, and we should try retaining that element and testing other variables. Because the other component to all this is that Wikipedia, along with the rest of the Internet, is constantly evolving. What worked six years ago to draw in new people doesn't work now, so the newbies of today and tomorrow will need very different kinds of help and guiding going forward.
Anyway, that's my A/B testing manifesto, hope you enjoyed it :D And thanks again for letting us test with your bot and giving us some meaty food for thought to chew on. We may disagree on a few minor points, but obviously we agree on the basic principles of getting more good editors involved in the project and making Wikipedia even better, and that's all I really care about in the end! Maryana (WMF) (talk) 19:17, 24 January 2012 (UTC)

Oh, I believe the statistics and the result. The problem is, that I see also the other side of the medal. I do think that our goal should not be only to retain more editors, or only to win more new souls. Sure, every gained editor is one, but that not at the cost of things at the other side. Spammers ánd inappropriate link additions are a real frustration to many, and I'm not sure that we should be happy with retaining one or even 10 editors more, where then more long-term editors start to burn out because of all the work. And that is something that your A/B testing does not measure here. I see the collegue spam-fighters around me, and some take extensive wikibreaks after a fight with the community, or with individuals. And manpower is already thin in this field. The current situation is not good, but changing something here because the statistics say it is an improvement here .. may result in chaos somewhere else. I am very aware of the limits of statistics, it is just what you measure. And if we have a 5% increase in retained editors ánd a 10% increase in re-inserted inappropriate external links, and more downtime of editors cleaning them up because of more work, frustration and burn-outs, and more editors not staying because of all the spam and other rubbish everywhere .. that is not worth it.

I am also not saying that the current message is optimal. I do realize it is long, and people don't read long messages. I would be all for shortening it, or doing something else smart to it. Putting the long message there results in people not reading it (e.g 'I removed the information you added in your edit to the page 'blahdiblah' because in your edit you included an url which points to a document on an site outside of The movie you are pointing to is an illegal copy of a video clip of the artist blahdiblah, and linking to material which is in violation of the copyright of the original creator of the work is a criminal offense in many countries. Moreover, the url to the movie is inappropriate because you are adding it to page 'blahdiblah', while the videoclip is actually of the clip 'halbidhalb', a performance of 'blahdiblah'. It does not lie in Wikipedia's goal to link to all the videoclips that 'blahdiblah' have produced ...' - tl;dr, after 'I removed the information you added' you stop reading and you re-add the link), putting a short message ('reverted addition of inappropriate external link to copyvio') means that people may not understand why ('reverted', what do you mean? 'inappropriate', who are you to say that? what is an 'external link'? what do you mean with 'copyvio'? - and asking is even more effort than reading a long message, and you have to wait for an answer, so just revert, I mean, copyvio, so what!), and a short message with a link to the full story ('reverted addition of inappropriate external link to copyvio') .. well, editors simply don't follow the link and hence still don't understand, because WP:EL and WP:COPYVIO are too long and it is too difficult to find what part we now actually mean.

Two points of the three are measurable through the A/B test results you have - percentage of retained editors ánd percentage of re-added external links (and you could also measure how many are actually inappropriate external links that are re-added). The third one will be a collateral effect of the latter of the two measured values (with a negligibly small compensation effect of the former) and not measurable through these statistics, but if the percentage of re-added external links is significant, that will also be significant. And over all Wikis (of which en.wikipedia is by far the biggest) we are operating under an addition speed of 1 external link every second - sure, a lot of those are good, but a lot of that still needs to be checked.

Take care measuring retained editors. My experience is that spammers tend to be 'retained editors' until they either get blocked (upon which many sock so they are still a retained editor), or their links get blacklisted. --Dirk Beetstra T C 20:06, 24 January 2012 (UTC)

About editor retention, see the diffs in time for the page m:User:COIBot/case/case7 .. some editors you don't want to stay. --05:51, 26 January 2012 (UTC)