Yobot July 2016[edit]

full-stop, comma, ref[edit]

Re: Special:Diff/730324343, please stop the bot, fix the ruleset and clean up. —Sladen (talk) 14:04, 18 July 2016 (UTC)

Sladen Thanks for the heads up. I fixed manually. What wold be the correct approach there? There was a mixed punctuation already. In case of full-stop, full-stop, ref I just remove the one full-stop. What cold I do in the case of mixed punctuation? -- Magioladitis (talk) 14:10, 18 July 2016 (UTC)
"First do no harm". Earlier this year we heard a lot about "skip conditions" (basic validity assertion), and skipping (not saving) when assertion fails. The intent of reporting the bug is to get the problematic rule either fixed, or disabled, or modified to flag for manual fixing. —Sladen (talk) 04:01, 19 July 2016 (UTC)

The thing is that in this edit no error was actually introduced. The error was already there is became more obvious after the edit. In fact, searching for duplicated punctuation become easier and this is the way I spot these things and fix them manually afterwards. And, to be honest, this situation is fairly uncommon. I 'll perform a database scan to see what I miss here. Thanks, Magioladitis (talk) 06:39, 19 July 2016 (UTC)

Perhaps I can offer the suggestion that the rules check the preceding characters(s); which would have caught the above, plus the preceding newline in Special:Diff/730326551. —Sladen (talk) 19:11, 19 July 2016 (UTC)
Sladen thanks for the suggestion. I 'll contact @NicoV: and @Rjwilmsi: for that. -- Magioladitis (talk) 19:33, 19 July 2016 (UTC)
In the first diff, by the looks of it, there was a full stop and a comma in the middle of a sentence, as well as the comma being after the ref. So two existing issues. Now most of the time in English in the middle of a sentence the next word will start with a lower case letter, but only most of the time, and I expect that "most" could be less in languages such as German using more capitalized letters. So I see no completely reliable way to determine whether the full stop or comma needed to be removed in that example (when the two bits of punctuation are the same the existing AWB code cleans them up). So I'd suggest that when Yobot is running to fix ref punctuation, pages are skipped if the resulting page has double punctuation before the ref, as a manual review will be needed. Rjwilmsi 01:25, 20 July 2016 (UTC)
Recognising ambiguous situations and skipping+flagging this for manual processing sounds like a good plan. Has this been implemented? —Sladen (talk) 11:11, 21 July 2016 (UTC)

Sladen it is a minor issue. It is low priority. It will be implemented at some point. -- Magioladitis (talk) 09:21, 26 July 2016 (UTC)

Excellent, thank you. Please can you list the bug number when it has been filed. —Sladen (talk) 10:04, 26 July 2016 (UTC)

Removal of quotes from title[edit]


Yobot removed quotation marked from a citation |title= in Special:Diff/730306422. From checking we can see that the original citation[1] does contain quotes (because it is a quote from the interviewee). So per the usual, please halt Yobot, fix the relevant rule, and rollback/clean up any edits that incorrectly removed quotes in the same situation, and provide links to the relevant diffs and revision control commits showing that you have done so. —Sladen (talk) 19:11, 19 July 2016 (UTC)

Sladen check the visual outcome. It contains quotes. -- Magioladitis (talk) 19:16, 19 July 2016 (UTC)
Lots of citations contain quotes; eg. a {{cite}} for [2] would be {{cite|title=Boris Johnson grilled over past 'outright lies' at uneasy press conference}}, which would be rendered as "Boris Johnson grilled over past 'outright lies' at uneasy press conference". The quotes are an essential part of the title. In the bug reported here, the whole title is itself a quotation, and the "quotation marks" denote this just as accurately. The presentation rendering is unrelated to the bug report. —Sladen (talk) 19:28, 19 July 2016 (UTC)

Sladen so wrong quotes were there before the bot arrives, right? -- Magioladitis (talk) 19:32, 19 July 2016 (UTC)

The citation |title="We are nothing without customers" formatting corresponds with the citation title. It ceased to match the citation title following Yobot's actions in editing page. This is why a bug report was brought to the attention of Yobot's operator, in the reasonable expectation that it will be acted upon. The willingness to act and be responsive to bug reports is required of bot operators per WP:BOTISSUE and WP:BOTCOMM. I would be grateful if you could correct the relevant rule so that quotes are not removed in the case that they are match the citation title. If this is beyond the capabilities of the bot ruleset or its operator, please flag these for manual processing. —Sladen (talk) 19:47, 19 July 2016 (UTC)

Sladen of course I am willing to fix the error as soon as I understand it. -- Magioladitis (talk) 21:11, 19 July 2016 (UTC)

Thank you. The situation of titles beginning or ending with quote mark is so frequent that the citation code has special code to ensure that adequate padding is provided between the quotation marks in the |title= and the surrounding presentational quote marks. This is in the kern_quotes() function in Module:Citation/CS1. —Sladen (talk) 22:22, 19 July 2016 (UTC)

@Rjwilmsi: can you please review this bug report? -- Magioladitis (talk) 21:34, 19 July 2016 (UTC)

Quote marks entirely enclosing a cite |title= are removed. Normally these quotes are incorrect as the rendering of CS1/CS2 adds the required quotes. In this case the title is itself a quote, so the quote marks are valid. We won't be able to differentiate between those scenarios in code, so we would have to remove the logic, and not correct the majority of cases where the quotes do need to be removed as the title is not itself a quote. Unless anybody can see a different solution? Rjwilmsi 01:19, 20 July 2016 (UTC)
Rjwilmsi, I concur. Yes, unless the bot code is able to replicate what a human would need to do (retrieve the citation and grep the <title> for on-line works) then by definition the bot ruleset will end up doing the wrong thing. The particularly unfortunate thing here is that once the quotes have been removed the operation is non-reversible. So in the world of "First, do no harm", I concur with disabling, and only producing manual candidate lists. Regarding the clean-up: how long as this rule been active, and how many edits do we now have to re-view? (Can you help Magioladitis to produce a candidate list?). —Sladen (talk) 08:55, 20 July 2016 (UTC)
Rjwilmsi, Magioladitis, how are we doing with compiling a candidate list for the retrospective clean-up? The edit summaries do not contain sufficient information to identify when this rule activated and altered the page contents. —Sladen (talk) 11:09, 21 July 2016 (UTC)

Sladen Bug reported on phabricator T140979. This will cause some attention. -- Magioladitis (talk) 11:26, 21 July 2016 (UTC)

Bug fixed. rev 12055: cite title field in quotes: do not removed double quote marks (title may itself be a quote). -- Magioladitis (talk) 12:26, 21 July 2016 (UTC)
Magioladitis. Really pleased to see a bug report filed + fixed. Now what are we going to do about the potential clean-up to identify how many other articles' citations may have been affected? —Sladen (talk) 23:20, 21 July 2016 (UTC)
Sladen it's impossible to know that. Since, the bot was not running to fix these but it was fixing as a secondary task I expect that not many pages were affected. But that's a guess. -- Magioladitis (talk) 23:24, 21 July 2016 (UTC)
Hunting back with git svn/git blame, it would appear that [3] (SVN r7057) was the origin of the bug. Does that look to you? —Sladen (talk) 00:49, 22 July 2016 (UTC)
Nice. So, it's been 6 years and nobody noticed. I guess it's because it's a rare issue to actually requite double quotes. -- Magioladitis (talk) 06:11, 22 July 2016 (UTC)
Until the bot's operator has performed an audit about what their bot did, we won't know the total. I'm hopeful that as a responsible bot operator you will find the time to assist with this. —Sladen (talk) 18:10, 25 July 2016 (UTC)

Sladen I am really sorry but I don't have the technical knowledge to create these statistics. AWB accounts are very popular and Yobot is not the only account editing pages. Most of references are handled by other bot. I suggest that you address to WP:VILLAGEPUMP or WP:BOTREQ. I contacted some fellow programmers but none had a clue of how to do this. -- Magioladitis (talk) 18:27, 25 July 2016 (UTC)

Thank you for having the courage to say this. —Sladen (talk) 08:58, 26 July 2016 (UTC)

Sladen I trust that editors monitoring the pages will fix or would have fixed the title were needed. I also underline the fact that people may be running older version of AWB and still removing quotes. The bug at the moment is fixed only in the Yobot's version of AWB. A ne release has not been published yet. -- Magioladitis (talk) 18:32, 25 July 2016 (UTC)

OK I had an idea. I'll provide some statistics very soon. I contacted some WMF guys. -- Magioladitis (talk) 18:44, 25 July 2016 (UTC)

Sladen A wiki search shows that there are only 78 pages with quotes in title in citation templates. I'll now fix those with actual problem manually. -- Magioladitis (talk) 06:00, 26 July 2016 (UTC)

Slight adjustment[4] of the Regex indicates that a claim of "78" would appear to be low (by a few thousand percent). One would expect to find near-zero title="quote" instance remaining in the database owing to removal by AWB. To get useful metrics, one needs to evaluate actual AWB diffs, with those where the before state contains title="quote", and the after state does not. —Sladen (talk) 09:12, 26 July 2016 (UTC)

Sladen I do not plan to investigate more on this. -- Magioladitis (talk) 09:15, 26 July 2016 (UTC)

I tried an earlier version of AWB (v. It turns AWB was not removing quotes unless there the quetes were at the beginning and/or at the end of the |title= of one of the supported citation template. This is more restrictive of what I have described in the beginning. i.e. Less pages affected. -- Magioladitis (talk) 06:12, 26 July 2016 (UTC)

Yes, it is only full-quotations where the implementation was causing an issue; thus the bug report that was filed here, and migrated to T140979, and subsequently corrected in AWB SVN r12055. —Sladen (talk) 09:20, 26 July 2016 (UTC)

Sladen the number should be even smaller, since we do not deal with all citation templates but with a selection of them. -- Magioladitis (talk) 09:24, 26 July 2016 (UTC)


Re: Special:Diff/730799065, please stop the bot, and fix the rulesets to meet the requirements of WP:COSMETICBOT. —Sladen (talk) 10:32, 21 July 2016 (UTC)

Sladen thanks. My bot was editing the same time with Bgwhite. Rare issue. I'll see what I can do. -- Magioladitis (talk) 10:35, 21 July 2016 (UTC)
Such an argument does not hold for Special:Diff/730798995 or Special:Diff/730798998. And for these two, the wording of MOS:HEADING was brought to your attention previously in May 2016. In Special:Diff/722980249 it was stated "I can report it myself in less than an hour. Thanks for the heads up!". —Sladen (talk) 11:06, 21 July 2016 (UTC)

Sladen on the first two diffs: Yobot tried to fix a header error while it should not because current script can't fix them. The script that fixes them is now handled by Dexbot: [5], [6]. In the future this error will be handled by Dexbot. -- Magioladitis (talk) 14:47, 21 July 2016 (UTC)

Sladen bug reported on phabricator as far as I recall. Thanks! Magioladitis (talk) 11:23, 21 July 2016 (UTC)

Sladen Please check List of known bugs. It would be very handy for us if you reported the bugs there too. Please also check whether the bug above was actually reported but I can't find it. -- Magioladitis (talk) 11:48, 21 July 2016 (UTC)

I am not aware of such a bug report appearing in, but I'm open to the possibility that it could have been filed by an alternative account. —Sladen (talk) 18:14, 25 July 2016 (UTC)

Sladen I recall there was a discussion about it but I can't find it. Maybe in Bot Owner's Noticeboard or something. -- Magioladitis (talk) 18:29, 25 July 2016 (UTC)

Please you could file the bug report anyway. Then there is a central place to track the issue, and no confusion about lost conversations. —Sladen (talk) 08:56, 26 July 2016 (UTC)

Sladen T141346 reported. -- Magioladitis (talk) 09:00, 26 July 2016 (UTC)

Editing speed[edit]

Hi Magioladitis, I can see from Special:Contributions/Yobot that AWB was being operated at extremely-high speeds. Speeds that make manual oversight or checking nearly impossible. Here are several instances of Yobot operating at >60 edits per minute: $ awk 'match($0,/(..:..), 25 July 2016/){print substr($0,RSTART,5)}' < huge.txt | uniq -c | sort -rn | head -n 11

     76 06:45
     74 06:59
     71 07:02
     70 06:58
     70 06:43
     68 06:46
     67 07:04
     66 07:03
     61 07:08
     61 07:07
     61 06:44

Please try to moderate edit rates to a manageable rate: that means keeping to a speed where manual oversight, review, and clean-up by human editors (including the bot's operator) becomes feasible. —Sladen (talk) 18:06, 25 July 2016 (UTC)

This is interesting. I thought the software as limiting the edit rate in 20-25 edits per minute. -- Magioladitis (talk) 18:36, 25 July 2016 (UTC)

Sladen I had to chuckle on this. I complained to multiple people about Ser Amantio di Nicolao's high rate of edits. He was constantly doing 60 edits/min *manually* via AWB. Nobody cared. Looking at today, he's doing ~25-30 edits/min, but only adding a cat to articles. Brought up either he has hacked AWB to remove the bot flag or is running multiple AWBs manually. Still nobody cared. Brought up he was doing worthless edits at that speed, (more recent example [7]). Nobody cared. Your argument has fallen flat when brought up to other bot owners, some doing 100 edits/min. Bgwhite (talk) 19:04, 25 July 2016 (UTC)
As has been pointed out multiple times before the policy is that a bot performing non-urgent tasks can do so once every ten seconds, i.e. six times a minute. Even urgent tasks should run no faster than twice this rate, but Checkwiki fixes are far from urgent. In which case why is it set to edit at 20-25 times a minute, 4x as fast as it should be editing?--JohnBlackburnewordsdeeds 09:36, 26 July 2016 (UTC)
JohnBlackburne I am editing in the maximum speed allowed by AWB. The only urgent thing here is that we do not want lists to get too outdated. And moreover, after the bot is done I need 1-2 hours to check whether the pages were fixed using WPCleaner. Then, I use AWB to fix manually the pages that were not fixed by the bot and then WPCleaner again to fix the pages I could not spot using AWB's alert system. I do not want to spend my entire day only running the bot to perform CHECKWIKI fixes. I nowadays wake up at 6 a.m. daily to have the lists as soon as possible and minimise the chance that the bot makes null edits. -- Magioladitis (talk) 09:41, 26 July 2016 (UTC)
Fixing minor punctuation errors and the like is not urgent by any standard. Often the problems are not noticeable, or are noticeable only to experienced editors, normal readers would not even notice. As for vandalism, yes some problems are the result of vandalism, but the correct thing to do then is to undo the vandalism. Having a bot blindly fix minor errors when the page has been vandalised can make it harder to fix vandalism, both as it’s no longer visible as the last edit in watchlists, and as it’s no longer possible to just revert or undo the edit.--JohnBlackburnewordsdeeds 12:45, 26 July 2016 (UTC)

Fun fact: I try to run the bot as early as possible. I even modified my waking up time habbits. This is because I got complains that when the bot arrived the error was not there anymore. -- Magioladitis (talk) 19:12, 25 July 2016 (UTC)

Today's stats: After 6 hours we have fixed 1525 pages and still 45 pages need to be fixed. Moreover, he have 2,000 pages from the monthly scan left. -- Magioladitis (talk) 09:42, 26 July 2016 (UTC)

Many CHECKWIKI errors occur as the act of vandalism. In these cases we certainly need to fix everything as early as possible. -- Magioladitis (talk) 09:48, 26 July 2016 (UTC)

Why did you deleted the Ekspress-AM 6 redirect?[edit]


This is an industry standard. Even the most used site on this Gunter Space Page. Ekspress is not a typo, is a transliteration of Russian. Why do you delete without doing the most basic research on the subject? Baldusi (talk) 13:30, 23 July 2016 (UTC)

Baldusi please tell me exactly which page you claim I deleted? -- Magioladitis (talk) 16:33, 23 July 2016 (UTC)
The redirect Ekspress-AM 6 Baldusi (talk) 20:11, 23 July 2016 (UTC)
Baldusi this was a redirect to itself. Perhaps a mistake? -- Magioladitis (talk) 20:19, 23 July 2016 (UTC)
(tps) Baldusi, I fixed it for you. You had it redirecting to itself! Thanks! Plastikspork ―Œ(talk) 20:25, 23 July 2016 (UTC)
Plastikspork thanks from me too. -- Magioladitis (talk) 20:47, 23 July 2016 (UTC)

CX cleanup[edit]


So, as I mentioned at the Hackathon and at Wikimania, my team is hard at work fixing issues with messy wiki syntax that Conetnt Translation creates.

One issue that we resolved is the adding of elements with attributes such as this: <span class="cx-segment" data-segmentid="24"></span>.


  • class="cx-segment"
  • class="cx-highlight"
  • class="cx-linter"
  • data-segmentid="XX" (where XX is a number)

These most often appear on <span> tags, but can also appear on other elements and in tables and File insertions.

As far as I can see, new articles that have such unnecessary syntax don't appear any longer, but there are some existing articles that do have them. I set up an AbuseFilter that catches edits that still have such tags, and I fix a few every week, but it makes sense to eliminate them completely.

If I'm not mistaken you run various cleanup bots with various rules. So you can add these:

  • If you run into an HTML element that has attributes as above, and has no content, it can be safely removed. Example.
  • If you run into a table or a File inclusion that has such attributes, the attributes can be removed, but the table of the File inclusion must be kept. (example with file, example with table).

If you find any interesting exceptions, I'd love to hear about it.

Thanks --Amir E. Aharoni (talk) 06:32, 24 July 2016 (UTC)

Thanks Amir! I'll keep an eye on it! -- Magioladitis (talk) 18:37, 25 July 2016 (UTC)

Yobot's category removal[edit]


Hi, can you explain why Yobot has just removed a category please? I can't figure it out. - Sitush (talk) 06:40, 26 July 2016 (UTC)

Sitush It was there twice. Check and you'll see the page is still in the category. -- Magioladitis (talk) 06:42, 26 July 2016 (UTC)

D'oh! Thanks. - Sitush (talk) 06:47, 26 July 2016 (UTC)


My understanding of stub tags is that they aren't supposed to be used as substitutes for categories. I.e., if someone is an "American economist stub" then that person should also be in the category for American economists or one of its children.

As for the other point - how would I go about modifying the code? I don't know the first thing about it, sorry. It's not mine; I picked it up from someone else. --Ser Amantio di NicolaoChe dicono a Signa?Lo dicono a Signa. 07:54, 26 July 2016 (UTC)