User talk:Citation bot

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Colin (talk | contribs) at 08:23, 17 July 2011 (→‎Page range and author lists: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Please click here to report an error.

For urgent matters, please leave notes directly on the operator's talk page so the bot can be paused as soon as possible.

URL tidying up (arxiv)

URLs like like "http://arxiv.org/abs/0710.4523" should be tidied up to |id={{arxiv|0710.4523}}

AKA

  • Lodders, Katharina (2008). "The solar argon abundance". The Astrophysical Journal. 674: 607. doi:10.1086/524725.

Should become

Also {{cite web}} with a URL that matches an arxiv preprint should be converted to {{cite arxiv}}

AKA

Should become

  • Mashnik, Stepan G. (2000). "On Solar System and Cosmic Rays Nucleosynthesis and Spallation Processes". arXiv:astro-ph/0008382. {{cite arXiv}}: |class= ignored (help); Unknown parameter |month= ignored (help)

Headbomb {talk / contribs / physics / books} 21:07, 6 March 2011 (UTC)[reply]

Can you give me a full list of the possible forms of URL that should be converted to arXiv parameters, noting any similar-looking URLs that should NOT be converted? Thanks, Martin (Smith609 – Talk) 15:08, 7 March 2011 (UTC)[reply]
Urls of the form http://arxiv.org/abs/SOMETHING or http://arxiv.org/pdf/SOMETHING should be converted to |arxiv=SOMETHING — due to recent changes in the {{cite journal}} and {{citation}} templates |arxiv= is now a separate parameter so we don't need to go through the |id= parameter. Also www.arxiv.org and xxx.lanl.gov are the same as arxiv.org without the www. The SOMETHING part may have a couple of different formats: either yymm.nnnn or archive/identifier, but I think it's easier just to treat it as a atomic unit. There are some rarer access paths (two of them in here) and some other mirror sites but that should at least get most of them safely. Urls that do not have the "/abs/" or "/pdf/" parts in them should be avoided. —David Eppstein (talk) 23:47, 10 March 2011 (UTC)[reply]
 Done in r284. Martin (Smith609 – Talk) 03:27, 17 March 2011 (UTC)[reply]
Oh, I didn't spot xx.lanl.gov. I'll do that anon. Martin (Smith609 – Talk) 05:20, 17 March 2011 (UTC)[reply]

Here's the full list of mirrors:

4

The regex I used to match them is \|(\s*)?url(\s*)?=(\s*)?http://(www\.)?(|au\.|br\.|cn\.|de\.|es\.|fr\.|il\.|in\.|jp\.|ru\.|tw\.|ul\.|aps\.|lanl\.|xxx\.)?(arxiv|lanl)\.(org|gov)/(abs|eprint|pdf)/.Headbomb {talk / contribs / physics / books} 18:20, 19 March 2011 (UTC)[reply]

Likewise a bare <ref>http://arxiv.org/abs/0710.4523</ref> or <ref>[http://arxiv.org/abs/0710.4523]</ref> should be converted to <ref>{{cite arxiv|eprint=0710.4523}}</ref> Headbomb {talk / contribs / physics / books} 10:46, 23 March 2011 (UTC)[reply]

This functionality requested at Wikipedia:Bots/Requests_for_approval/Citation_bot_8. Martin (Smith609 – Talk) 12:22, 1 April 2011 (UTC)[reply]
Resolved

URL tidying up (ASIN)

Resolved

URL tidying up (bibcode)

  •  Done As above, URLs like "http://articles.adsabs.harvard.edu/full/1998MNRAS.301..787L" should be tidied up to |bibcode=1998MNRAS.301..787L.
  • A {{cite web}} with such a link should be converted to {{cite journal}} or {{citation}}
  • Bare references (<ref>http://articles.adsabs.harvard.edu/full/1998MNRAS.301..787L</ref> or <ref>[http://articles.adsabs.harvard.edu/full/1998MNRAS.301..787L]</ref>) should be converted to <ref>{{cite journal|bibcode=1998MNRAS.301..787L}}</ref>

I think most of this was covered at User talk:Citation bot/Archive 1#Bibcodes 2, but I don't know how refined the logic was so I'm reposting it. Also, the "articles.adsabs.harvard.edu" url might have been missed in the midst of the discussion. Headbomb {talk / contribs / physics / books} 11:47, 20 March 2011 (UTC)[reply]

Resolved

URL tidying up (JSTOR)

Headbomb {talk / contribs / physics / books} 12:38, 20 March 2011 (UTC) Not sure whether converting bare URLs to references would be covered by existing bot request? If not, please feel free to make a request for this function. Martin (Smith609 – Talk) 23:45, 27 March 2011 (UTC)[reply]

Resolved

mojibake on dashes/accented characters from citation bot button

Status
 Fixed in GitHub Pull 346
Reported by
Rjwilmsi 20:07, 9 March 2011 (UTC)[reply]
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
The Citation Bot button seems to have an encoding issue whereby page dashes, accented characters are changed to some form of mojibake
Relevant diffs/links
sandbox
We can't proceed until
A user confirms that the fix worked


Please let me know precisely how to replicate this bug. Martin (Smith609 – Talk) 03:28, 17 March 2011 (UTC)[reply]
Also see for example [1] (towards the end) and this diff --- seems to mangle first non-Latin character in "journal" field, but not other fields like "title". cab (call) 08:20, 29 March 2011 (UTC)[reply]
Never mind, this seems not entirely related, I'll file a separate report below. cab (call) 12:40, 31 March 2011 (UTC)[reply]
Here's an example: {{ref doi|10.1007/BF00394819}} Martin (Smith609 – Talk) 19:02, 22 April 2011 (UTC)[reply]
This example is of a bug in the crossRef database. As far as I know, everything within my remit was  Fixed in GitHub Pull 346. Martin (Smith609 – Talk) 04:35, 19 June 2011 (UTC)[reply]

case

Status
new bug
Reported by
LeadSongDog come howl! 07:25, 13 March 2011 (UTC)[reply]
Type of bug
Cosmetic
What happens
Title and journal paras get Crossref's title case if doi is input, or PubMed's sentence case if pmid is input
What should happen
prefer one style consistently, per article
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=User:Chris_Capoccia/Sandbox&diff=prev&oldid=418471207


Not sure that it's possible to convert from Title Case to Sentence Case, or desirable to convert in the opposite direction. Thoughts? Martin (Smith609 – Talk) 23:44, 27 March 2011 (UTC)[reply]

Bot does not upload at the correct location... yet again

Status
 Fixed in GitHub Pull 378
Reported by
Headbomb {talk / contribs / physics / books} 00:44, 14 March 2011 (UTC)[reply]
Type of bug
Improvement/Inconvenience
What should happen
When people use {{cite doi|doi:10.xxxx}} or {{cite doi|http://dx.doi.org/10.xxxx}} , the bot uploads the template at Template:Cite doi/doi:10.xxxx or Template:Cite doi/http:.2F.2Fdx.doi.org.2F10.xxxx. The bot should first clean {{cite doi|doi:10.xxxx}}/{{cite doi|http://dx.doi.org/10.xxxx}} to {{cite doi|10.xxxx}}, then upload at {{cite doi|10.xxxx}}
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Category:Cite_doi_templates&from=A
We can't proceed until
A user confirms that the fix worked


Are any of these edits associated with a version of the bot greater than 265? Martin (Smith609 – Talk) 04:32, 18 March 2011 (UTC)[reply]
I haven't found version in the edit summaries, but these are recent creations and the lists seems to keep growing. For example [2] was made on March 14. Headbomb {talk / contribs / physics / books} 18:12, 19 March 2011 (UTC)[reply]
Handling improved in r343. Let me know if any more appear. Martin (Smith609 – Talk) 02:16, 12 April 2011 (UTC)[reply]

Yup. See [[3]] again. Headbomb {talk / contribs / physics / books} 18:43, 9 June 2011 (UTC)[reply]

Think I've got it this time. Martin (Smith609 – Talk) 15:31, 19 June 2011 (UTC)[reply]

ADSABS database API

Examples

This should be useful for the bot. I chose the "endnote", since it returns the full name of the journal, but there are other formats. Headbomb {talk / contribs / physics / books} 11:21, 15 March 2011 (UTC)[reply]

Any plans on making use of this? Headbomb {talk / contribs / physics / books} 16:54, 16 March 2011 (UTC)[reply]
Sounds great! Obviously useful for bibcodes; presumably its DOI database is not as extensive as CrossRef's. What other uses do you forsee it being put to? Martin (Smith609 – Talk) 05:00, 17 March 2011 (UTC)[reply]
CrossRef probably has a bigger doi database true. However, many (older) publications do not have DOIs, but they do have bibcodes. The ADSABS database will also returns many identifiers (arxiv, bibcode, and doi are returned when available), and you can query it via any of them, meaning that you can further cross check a {{cite arxiv}} to see if it should be updated to a {{cite journal}}. The journal "issue" also tend to be better documented on the ADSABS database, at least in my experience. Headbomb {talk / contribs / physics / books} 07:15, 17 March 2011 (UTC)[reply]
Working on it: bibcodes are supported in r290. Do you have an example of an instance with an issue number? Martin (Smith609 – Talk) 04:07, 18 March 2011 (UTC)[reply]
Bibcode:2011PhRvL.106k8501H has issue number 11. ENDNOTE format does not return it, but BIBTEXT does (others formats probably return it as well). Headbomb {talk / contribs / physics / books} 06:07, 18 March 2011 (UTC)[reply]
I tried using {{cite journal|bibcode=1998ApJ...502..538B}} (for example) on Gravitational microlensing and the bot failed to expand the citation. Headbomb {talk / contribs / physics / books} 12:24, 18 March 2011 (UTC)[reply]
I guess I just wasn't using the latest bot revision. It works fine. Headbomb {talk / contribs / physics / books} 19:52, 19 March 2011 (UTC)[reply]

Although the journal field gets cluttered with weird stuff and it missed a few (like . Bibcode:2001A&A...375..701D. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help)). Headbomb {talk / contribs / physics / books} 19:53, 19 March 2011 (UTC)[reply]

BTW, does the bot try to minimize the number of queries via things like http://adsabs.harvard.edu/cgi-bin/abs_connect?bibcode=1970ApJ...161L..77K&bibcode=2004ApJ...600L..93G&data_type=ENDNOTE ? Headbomb {talk / contribs / physics / books} 21:30, 22 March 2011 (UTC)[reply]
Unresolved

Bot does not upconvert {{cite arxiv}} to {{cite journal}}

Status
 Fixed
Reported by
Headbomb {talk / contribs / physics / books} 01:23, 18 March 2011 (UTC)[reply]
Type of bug
Improvement
What happens
Bot does not upconvert {{cite arxiv}} to {{cite journal}}
What should happen
If you check arXiv:1010.3003, you can see it has a DOI (doi:10.1016/j.jocs.2010.12.007 and has been published in Journal of Computational Science. I seem to recall that citation bot converted {{cite arxiv}} to {{cite journal}} upon publication?
Relevant diffs/links
[4]


Because arxiv use :s in their XML, the bot couldn't extract the journal or DOI. I've worked around this in r289. Martin (Smith609 – Talk) 03:49, 18 March 2011 (UTC)[reply]

Tried it again and it still doesn't work. Headbomb {talk / contribs / physics / books} 15:38, 19 March 2011 (UTC)[reply]
Were you using r289 or greater? You'll need to replace "citation-bot" with "DOI_bot" in the URL to use the latest version of the bot. The current stable version is r273. Martin (Smith609 – Talk) 18:16, 19 March 2011 (UTC)[reply]
I'm using the one in the toolbox via your gadget (User:Headbomb/monobook.js). Is there a way to have that link to the most recent version? Headbomb {talk / contribs / physics / books} 19:36, 19 March 2011 (UTC)[reply]
Alright it works if I use that URL, although it doesn't work 100% correct. For example, here the bot correctly retrieved the DOI and various parameters, but did not convert to a cite journal, or retrieve the journal parameter. Headbomb {talk / contribs / physics / books} 20:04, 19 March 2011 (UTC)[reply]
 Done: The CrossRef database is now consulted to recover the journal name. Martin (Smith609 – Talk) 16:19, 27 March 2011 (UTC)[reply]
Likewise here [5] it converts, but it's a bit half-assed. I notice that there's no DOI in the arxiv, so it can't use that and there's not much it could do. But the main problem when converting to a cite journal is that the bot keeps the year from the cite arxiv which it shouldn't do. It's very common that publication year differ between a preprint and the official publication, so whenever it upconverts, the year/date should be discarded. Headbomb {talk / contribs / physics / books} 20:11, 19 March 2011 (UTC)[reply]
This point improved in r313. Martin (Smith609 – Talk) 16:05, 27 March 2011 (UTC)[reply]

Awesome. I didn't test it yet, but would it be possible to assign priorities to databases? Like CrossRef > ADSABS> arXiv.org? As in, if you find a doi in the arXiv.org database, query CrossRef with the DOI first, if that fails, query ADSABS with the DOI, and only then use the other information from Arxiv.org? I'm mentionning this because data from ArXiv.org tends to be the crappiest, and CrossRef seems to give the best. Headbomb {talk / contribs / physics / books} 13:04, 28 March 2011 (UTC)[reply]

That'd be quite a bit of work; can you show me a few examples where this would be beneficial, so that I can get my head in? Martin (Smith609 – Talk) 04:29, 19 June 2011 (UTC)[reply]

JSTOR problem

Status
Resolved
Reported by
Jheald (talk) 11:03, 20 March 2011 (UTC)[reply]
Type of bug
Improvement
What happens
JSTOR links replaced by new JSTOR reference with a spurious parenthesis; even without it the new reference does not point to the right article.

Examples:

Old link was: http://links.jstor.org/sici?sici=0021-9002(197008)7%3A2%3C508%3AOAR%3E2.0.CO%3B2-Y
Bot changed to: jstor=197008) -- note trailing parenthesis
Actual correct short url: http://www.jstor.org/pss/3211992 -- No, I don't know how you're meant to work that out either.

Old link was: http://links.jstor.org/sici?sici=0003-4851(197212)43%3A6%3C%3AAR1%3E2.0.CO%3B2-1
Bot changed to: jstor=197212) -- again, note trailing parenthesis
Actual correct short url: http://www.jstor.org/pss/2240189 -- again, seems to be no way to determine this.

Note that http://www.jstor.org/pss/197008, which the new link resolves to if you take the bracket away, goes to a completely different article in a completely different journal.

Relevant diffs/links
[6]
We can't proceed until
Agreement on the best solution


Bot now (r306) won't interpret the SICI as a JSTOR ID. It should be able to determine the JSTOR ID from the sici, though. I'll implement that anon. Martin (Smith609 – Talk) 18:00, 22 March 2011 (UTC)[reply]

Remove accessdates when there is no URL

Status
 Done:  Fixed in GitHub Pull 369
Reported by
Headbomb {talk / contribs / physics / books} 11:56, 20 March 2011 (UTC)[reply]
What should happen
When {{cite journal}}/{{citation}}/any {{cite xxx}} (except {{cite web}}) do not have URLs (post cleanup), the accessdate should be removed. It's not displayed, and is pretty useless.
We can't proceed until
Agreement on the best solution


Is this true even if a webpage is cited with Template:Citation? Martin (Smith609 – Talk) 18:08, 22 March 2011 (UTC)[reply]

Not sure what you're getting at... If there's a {{citation}} without a url, it's pretty safe to assume that people aren't citing a web page. No? Headbomb {talk / contribs / physics / books} 18:17, 22 March 2011 (UTC)[reply]
Could there ever be an archiveurl but no url, for example? Just making sure that we've thought of everything before I code. Martin (Smith609 – Talk) 02:12, 23 March 2011 (UTC)[reply]
The only scenario I can think of where that would happen is when someone (bot, script, human) archived a url such as |url=http://articles.adsabs.harvard.edu/full/1998MNRAS.301..787L (which I've seen a few times), and then someone (bot, script, human) later cleaned up the url into |bibcode=1998MNRAS.301..787L.
Not sure what should be done in this situation. When there's an archive url and no url, the template gives an error, so it would be cleaned up pretty fast anyway. Maybe citation bot can check if the archived url is the same as the url it just cleaned up, and remove everything (url, accessdate, archiveurl, and archivedate). Or maybe it's safe to assume that if the url resolves to an identifier, so does the archive url and again everything can be removed (url, accessdate, archiveurl, archidate). Headbomb {talk / contribs / physics / books} 10:45, 23 March 2011 (UTC)[reply]
I always wondered what the point was of retaining dead urls when archiveurls were in place. There should be a better way. LeadSongDog come howl! 14:10, 23 March 2011 (UTC)[reply]
Is there any guarantee that a citation with a URL and an archiveurl will also have fields such as title, author, publisher, journal, etc. filled in? If not, the dead URL should be retained in case the archive goes away, because the publisher can often be identified from the URL. Jc3s5h (talk) 15:53, 23 March 2011 (UTC)[reply]
It's not like the article history is being deleted, the dead url is still there anyhow. Also, I'm not aware of us having a history of problems with archives that vanish. After all, that's what they're for. If they don't make provisions for ongoing availability they won't get the fonds in the first place. Still, if there's a real, valid concern for this, we can simply keep the dead url in hidden text.LeadSongDog come howl! 18:13, 23 March 2011 (UTC)[reply]
The problem with doing this sort of change, according to wikipedia bot policies currently in place and according to several very vocal folks, is that its not allowed because it doesn't render any changes to the page. So since it doesn't render any changes to the page (just like nested, attention=no, invalid parameters on templates, and a wide array of other things) it cannot be deleted via a bot. If someone wants to try and change this policy I would be glad to support it but till then we can't do this type of change unless something else is being changed on the article at the same time. --Kumioko (talk) 18:40, 23 March 2011 (UTC)[reply]

We've gone a bit off topic; the citation bot should NOT remove URLs just because they are dead, nobody should be doing this (need to look for correct URL if moved / mirror / archive link but not just drop URL). Removal of |accessdate= when a URL is converted to an identifier is fine and a sensible housekeeping step. Rjwilmsi 14:08, 24 March 2011 (UTC)[reply]

Yeah it should definitely not remove dead urls, although this never was about that. Just cleaning up citations with no urls but with accessdates (for whatever reason). Headbomb {talk / contribs / physics / books} 20:54, 25 March 2011 (UTC)[reply]
digression
I agree that its sensible, the problem is according several users inluding CBM and Xeno in multiple venues (including AWB recently causing the revising of one of the AWB rules of use) if the change does not render a change to the page or a category it cannot be done alone and must only be done if another more major change is being made at the same time. I don't agree with it and have repeatedly fought against it but thats the way it is so if somone wants to do this change they either have to limit it to only articles where another more significant change is being made at the same time or change the rule. --Kumioko (talk) 14:58, 24 March 2011 (UTC)[reply]
Who here suggested that this would be done as the only change in an article? Making these edits are perfectly within the bot policy of uncontroversial maintenance. Headbomb {talk / contribs / physics / books} 21:29, 24 March 2011 (UTC)[reply]
Its been said many times over the last few months by several users most notably CBM, Xeno and a couple of other that changes that do not render changes to the article or that do not affect a category cannot be done alone and that is one of the prime reasons why they modified the rules of use recently for AWB. You can see these discussions on the talk pages for the Wikiproject Council, AWB, the ANI that CBM opened up against me when he revoked my AWB access and a variety of other places. Even my talk page archives, Rich Farmourough and a others. All because they said that the "consensus" was that these changes are disruptive, fill wathclists, waste resources, etc. I fought hard against it and lost. I even lost my access to AWB because I told CBM that 1 editor not liking a change didn't constitute a lack of consensus. Rather than open a discussion he revoked my AWB and opened an ANI against me. Ok fine its water under the bridge and Wikipedia is poorer for it (obviously I'm still a little pissed about that). In the end I can do nothing to stop this, nor do I want to and aside from the fact that it apparently breaches a policy I never agreed with is perfectly fine. But I wanted to present the warning so that if you do go ahead with it you were aware that someone may stop the bot in protest or block it completely from editing. Because a stupid policy exists that prevents us from removing garbage fields and deprecated parameters from articles because they don't render changes. --Kumioko (talk) 20:01, 25 March 2011 (UTC)[reply]
The problem with your edits was that you were doing edits of very little value, which was effectively spamming watchlists with bot-like edits for no real reason (and unlike other low-value edits with bot flags, you cannot filter them out of your watchlist). Removing something like "|nested=yes" did not prevent damage, nor did it cleanup anything other than the edit window. This was raised many times, and you kept doing it against consensus. Removing |accessdate=2005-06-28 from an article without a url however, it not such an edit. If there's no url, but there's an accessdate, when another editor comes along and adds a URL without noticing or updating the existing accessdate field, you have a false accessdate that is displayed. That is why accessdate should be cleaned up, unlike the |nested=yes which got your AWB access revoked. Were it up to me, I'd have sent a bot (and not a human with AWB, per the watchlist thing) to nuke the |nested=yes from talk pages templates, however regardless of my personal position on the issue, the community doesn't like these low-value edits when done on their own.
I understand you're pissed about that, but ranting about it on other bots' page isn't going to make anything better. If you have a genuine opposition to removing accessdate garbage from articles, voice it here. But if you're playing the "concerned bureaucrat, who's just mentioning the rules and enforcing the procedures" because you're pissed about them and want to break everyone else's balls as a way to get back at the universe, please don't.Headbomb {talk / contribs / physics / books} 20:49, 25 March 2011 (UTC)[reply]
My point in all that statement really boils down to them being the same type of edit. Regardless of whether a future change will cause it to render removing it at this time does not render any changes and therefore according to policy as has been quoted to me many times it can't be done. For what its worth though nested was not why my access was revoked. It was for doing things like this and it was this edit with no nested parameter at all where I removed about a thousand characters of crap that could cause future rendering problems just as you point out above, slow page load times and confuse editors not to mention making it a whole lot harder to deal with for the bots having to mine through it to make whatever change their there for, potentially causing them to break something. So I go back to my original comment of, if someone wants to do this type of change then the unnecessarily restrictive rule that is being used to brow beat editors trying to make the 'pedia better needs to be changed. Otherwise you are more the welcome to make the changes to articles were other more significant changes are being made. With all that said I am not an admin, I have absolutely no other power except for that of the pen, or keyboard as it were so if you go ahead with it I am not going to stop you but its setting a bad example when we use a policy as a mallet on one editor and then push it aside for another because we don't want to follow it. --Kumioko (talk) 01:41, 26 March 2011 (UTC)[reply]

Its not a digression when a concern is raised about policy. Just because you don't like the policy doesn't mean you don't have to follow it. But I won't post here again because its obvious that knowone cares. So it seems policy is only policy when we want it to be and perhaps this is a WP:IAR scenario. --Kumioko (talk) 04:12, 26 March 2011 (UTC)[reply]

Is there consensus on a solution that can be implemented, or shall I close this thread? Martin (Smith609 – Talk) 02:26, 12 April 2011 (UTC)[reply]
Yes, this can be implemented. Headbomb {talk / contribs / physics / books} 02:58, 12 April 2011 (UTC)[reply]
Resolved

Erroneous insertion of PNAS DOIs

Status
 Fixed in r307
Reported by
Rjwilmsi 09:19, 26 March 2011 (UTC)[reply]
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
PNAS DOIs and Bibcodes are inserted though they don't correspond to the paper cited. This seems to be a recently introduced error.
Relevant diffs/links
[7], [8]
We can't proceed until
Bot operator's feedback on what is feasible


Also [9] for Nature DOI & Bibcode. Rjwilmsi 09:39, 26 March 2011 (UTC)[reply]

Investigated further: broken between 289 and 294. All edits with 294 or later need to be checked, I've already had to revert over 50. Rjwilmsi 11:58, 26 March 2011 (UTC)[reply]
I've requested a temporary bot block on WP:ANI. Rjwilmsi 12:09, 26 March 2011 (UTC)[reply]
The bot searched the AdsAbs database using the specified title. However AdsAbs often returned very fuzzy matches. As of r307 the bot now checks that the returned title is a good match for the input title. Sorry about the inconvenience caused in the interim; I'll be doing more thorough testing after each edit from the nonce. Martin (Smith609 – Talk) 13:18, 27 March 2011 (UTC)[reply]
I'm not convinced that a match on title alone will be sufficiently reliable. Are you cross validating other criteria such as volume, year, journal and/or pages etc.? Rjwilmsi 20:46, 27 March 2011 (UTC)[reply]

Bot adding URLs of articles reviewing books to the books themselves

Status
new bug
Reported by
— Cheers, JackLee talk 08:18, 31 March 2011 (UTC)[reply]
Type of bug
Catastrophical: The bot should stop editing immediately
What happens
The bot is adding the URLs of book reviews in journals to citations of the books themselves.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Robert_Hues&action=historysubmit&diff=421517624&oldid=415277958
We can't proceed until
Agreement on the best solution


Oh, I've just noticed that this issue has already been reported earlier. But it doesn't appear to have been resolved. — Cheers, JackLee talk 08:20, 31 March 2011 (UTC)[reply]
That edit was tagged "([Pu316]Misc citation tidying.)" by the bot. Does it still make that error after the recent code revisions?LeadSongDog come howl! 15:41, 31 March 2011 (UTC)[reply]
I haven't encountered any other problems with the articles on my watchlist so far. — Cheers, JackLee talk 08:40, 7 April 2011 (UTC)[reply]
Reviews share much semantic information with books. Any ideas on how to avoid false positives? Martin (Smith609 – Talk) 02:27, 12 April 2011 (UTC)[reply]
I could be flip and say "Don't use {{citation}} where {{cite book}} is intended", but I won't. If a citation has an ISBN the bot should not add data pertaining to a publication in a serial, with a possible exception where that citation appears in an article that is in Category:Books. If the input data provides ambiguos key data (such as title and author only) but searches find both a target with an ISBN and a target with a serial title or ISSN then it would not hurt to explicitly flag the bot-added wikitext for human consideration. Another option is to show {{cite book}}
(reviewed at {{cite journal}}), though there could be objections to that on wp:SAYWHEREYOUGOTIT grounds.LeadSongDog come howl! 17:12, 12 April 2011 (UTC)[reply]
In an article about the book, or about the author of the book, it's useful to include reviews, but as separate sources, not as additions to the book's publication data. As for {{citation}} vs {{cite book}}, they have different formats: {{cite book}} goes with {{cite journal}} and is incompatible with {{citation}}. In any case, some books really do have dois or urls for the whole book that would be helpful for citation bot to add, so it is not as simple as "don't add urls to books". In this case, though, the presence of a "review author" in the ADS data is a giveaway that this is a book review, and the fact that the review author is not the same as the author of the {{citation}} is an indication that the original citation refers to the book itself and not to its review. —David Eppstein (talk) 17:51, 12 April 2011 (UTC)[reply]
Format consistency (a MOS issue) should never be allowed to trump accuracy of attribution (a V issue). That way lies madness. If converting to all-{{citation}} or all-{{cite xxx}} breaks the correctness of the attribution then the conversions should stop. Readers can tolerate a misplaced comma or period far better than they can a citation of the wrong source. When the source metadata is fully identified the conversion might then be conducted more safely. LeadSongDog come howl! 19:08, 12 April 2011 (UTC)[reply]
There's nothing inaccurate or unverifiable about books formatted using {{citation}}. —David Eppstein (talk) 02:45, 13 April 2011 (UTC)[reply]

The example given for this bug was

{{citation|last=Shirley|first=John W[illiam]|title=Thomas Harriot: A Biography|location=Oxford|publisher=[[Oxford University Press|Clarendon Press]]|year=1983|isbn=978-0-19-822901-8}}

which the bot changed to {{citation|last=Shirley|first=John W[illiam]|title=Thomas Harriot: A Biography|location=Oxford|journal=Journ. History of Astronomy V.17|volume=17|pages=71|publisher=[[Oxford University Press|Clarendon Press]]|year=1983|isbn=978-0-19-822901-8|bibcode=1986JHA....17...71D}}

If instead the entry had been {{cite book |last=Shirley|first=John W[illiam]|title=Thomas Harriot: A Biography|location=Oxford|publisher=[[Oxford University Press|Clarendon Press]]|year=1983|isbn=978-0-19-822901-8}}

the ambiguity would not have been there in the first place.LeadSongDog come howl! 04:22, 13 April 2011 (UTC)[reply]

It might be worth mentioning that {{citation}} and {{cite book}} take almost the same parameters as each other, and display similarly-named parameters in almost the same manner (punctuation being the main difference). As regards specifics, {{cite book}} allows both |trans_title= and |trans_chapter= whereas {{citation}} doesn't, conversely, citation has a parameter for the journal name (|journal= and its four synonyms) which cite book doesn't. It is the presence of any one of |journal=, |periodical=, |newspaper=, |magazine= or |work= which causes {{citation}} to use the "journal" format; if all five are absent, it uses the "book" format. A smaller matter is that if you want harvard referencing to link, and you're not intending to use a custom link (ie |ref={{harvid|...}} ), you must provide an explicit |ref=harv to cite book, whereas that is the default action for citation. Apart from those threetwo areas, {{citation}} and {{cite book}} primarily differ in some very minor ways such as the separator between the various items of information. --Redrose64 (talk) 11:30, 13 April 2011 (UTC) amended Redrose64 (talk) 13:15, 13 April 2011 (UTC)[reply]
That's correct as far as it goes, but it misses a key factor. A non-entry in {{citation}} of |journal= or its synonyms is not the same as an active entry saying "this is a book I'm citing". Use of {{cite book}} provides that additional bit of information. The problem can also be seen as the ambiguity of |title= which may be used to mean an entire book or a single letter to the editor of a newspaper in the same {{citation}}.LeadSongDog come howl! 20:22, 13 April 2011 (UTC)[reply]
This ambiguity is a good thing, and one of the big reasons why I prefer {{citation}} to {{cite book}} etc. The reason is that when we use {{citation}} we don't need to decide whether we should count a technical report or a senior thesis or a doctoral dissertation or a booklet or a set of course lecture notes or a 100-page online preprint or a single-article issue of a journal is really a "book", and we don't need to set up and maintain separate {{cite report}} and {{cite thesis}} and {{cite booklet}} and {{cite course}} and {{cite preprint}} templates; we just tell citation what types of information are available for this citation and it does the formatting. For similar reasons, when we're using {{citation}}, we don't need to decide whether a periodical that we're citing something in is a journal or a magazine or a newsletter or a newspaper or a catalog, and we don't need separate {{cite journal}} and {{cite newspaper}} and {{cite magazine}} and {{cite newsletter}} and {{cite catalog}} templates. Especially in the context of Citation bot, a piece of software that is not necessarily good at making these sorts of subtle distinctions between different types of publication, it's helpful to avoid having to make these distinctions at all. —David Eppstein (talk) 20:59, 13 April 2011 (UTC)[reply]
I can't agree that it is a good thing to leave to the bot a determination that humans have difficulty making. Why should we then turn around and complain that the bot gets it wrong? LeadSongDog come howl! 21:58, 13 April 2011 (UTC)[reply]
The error in question had little or nothing to do with the fact that the citation in question was to a book. It was an error of not understanding that the ADS "review author" field implies that the ADS entry is to a review of a source and is not an entry for the source itself. The same error could equally easily have been made for a published review of a journal article. So the fact that the broken citation was a {{citation}} template rather than a {{cite book}} template is a complete red herring. As for why {{citation}} is a good thing, in relation to what citation bot understands or doesn't understand: it's because using {{citation}} allows its users (including citation bot) to omit irrelevant information (like whether it really is a book vs a booklet or whatever), freeing them from having to understand that irrelevant information. —David Eppstein (talk) 23:08, 13 April 2011 (UTC)[reply]
Are you perhaps confusing the needs of readers with those of editors? Many of both groups consider peer review, journal rankings, and even finding physical access to the dead trees to be relevant factors when assessing how much credence to give the source. In any case, the "review author" field is not universally supported by all serial indices, so while using that might be a solution for ADS, it is not a general fix for all the databases. If available we would do better to implement a "publicationtype" parameter (as on PubMed) to explicitly distinguish reviews, systematic reviews, original articles, letters, consensus reports, etc. This could also enormously facilitate article reviews by highlighting the use of primary sources. LeadSongDog come howl! 00:50, 14 April 2011 (UTC)[reply]
Resolved

21:53, 16 July 2011 (UTC)

Bot hangs @ nytimes

This link: http://www.nytimes.com/2009/09/06/magazine/06Economic-t.html causes the bot to hang in Mainstream economics. Martin (Smith609 – Talk) 14:59, 6 April 2011 (UTC)[reply]

 Fixed in GitHub Pull 334

Resolved

Capitalization error in name with diacritic

Status
 Fixed in GitHub Pull 368
Reported by
Ucucha 02:23, 13 April 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
[10] "GéRaldine". The CrossRef API gives the correct capitalization ("Géraldine").
We can't proceed until
Bot operator's feedback on what is feasible


That edit was many versions ago, see the edit comment: "11:56, 27 March 2011 Citation bot 2 (talk | contribs) m (618 bytes) ([cw310]Misc citation tidying.)" It looks to have been done during the creation of the cite doi subpage, though the edit comment doesn't reflect that. Pubmed also has the correct spelling for the forename. LeadSongDog come howl! 20:59, 13 April 2011 (UTC)[reply]
Another example (this one using version 305). In this case, CrossRef has no data. Ucucha 23:25, 18 April 2011 (UTC)[reply]

End of page range not added

Status
 Fixed in GitHub Pull 367
Reported by
Ucucha 18:07, 17 April 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
Bot does not enter end of page range. The full page range is given by the CrossRef API.
Relevant diffs/links
Correction of the bot (previous edit was creation of cite doi subpage).
We can't proceed until
Agreement on the best solution


Is it this fix that now causes existing condensed page ranges to be expanded? This is I believe against the wikiproject medicine preference to display the page ranges in the condensed format. Rjwilmsi 10:30, 20 June 2011 (UTC)[reply]

Enacted small enhancements

 Fixed in GitHub Pull 374

  • In press page ranges may be denoted | pages = n/a–n/a . See Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1002/bies.201100035, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1002/bies.201100035 instead..192.75.204.31 (talk) 15:50, 11 July 2011 (UTC)[reply]


  • "| Volume: 12 " and "| Issue: 12" are replaced by "Volume = : | unused_data=12". Martin (Smith609 – Talk) 14:06, 12 July 2011 (UTC)[reply]


Suggested small enhancements

Curly quotes

  • Get rid of curly quotes (WP:MOS#Quotation marks). You can put $title = preg_replace(array('/[`‘’]/u', '/[“”]u/'), array("'", '"'), $title); or something somewhere in your code. Ucucha 11:42, 16 June 2011 (UTC)[reply]
    That doesn't seem to work, and neither does $str = preg_replace('~&#821[679];|[\x{2039}\x{203A}\x{2018}-\x{201B}]|&[rl]s?[ab]?quo;~u', "'", $str);. I'm stumped. Suggestions welcome! Martin (Smith609 – Talk) 23:43, 18 June 2011 (UTC)[reply]
    Where exactly is the relevant code? Perhaps it has to do with different character encodings somewhere. Does preg_last_error() return anything? Ucucha 23:48, 18 June 2011 (UTC)[reply]
    The regex in r358 fails to compile: there's a closing parenthesis missing in '~&([rlb][ad]?quo;~' (line 2010 of DOItools.php). Not sure exactly what you want there, but you should either add or remove some parentheses there. Ucucha 00:03, 19 June 2011 (UTC)[reply]
    Ugh. Well spotted. Thank you! This brought a couple of other bugs to the surface too, so performance and edit summaries are both now improved.  Fixed in GitHub Pull 362
Resolved

Bot breaks on deleted (and restored) pages

Status
Resolved
Reported by
Piotr Konieczny aka Prokonsul Piotrus| talk 21:38, 23 April 2011 (UTC)[reply]
Type of bug
Inconvenience

-->

What happens
I run the bot through Wikipedia:Citation expander on Sepukku
What should happen
instead of executing the citation fixing code
Relevant diffs/links
I get an empty page at http://en.wikipedia.org/w/index.php?title=Seppuku&action=submit with the "Warning: An administrator deleted this page since you started editing it. Please check the deletion log to see the reasoning. " reasoning. Show changes indicates that the bot would save an empty page.
Replication instructions
Try running it on the same page?
We can't proceed until
User

-->

Requested action from maintainer
fix the bug? Medium priority, I guess - the bot simply does not work on some pages


I can't reproduce this. Does it work now? Martin (Smith609 – Talk) 22:11, 11 June 2011 (UTC)[reply]

Adding additional authors messes up citations using "et al".

Status
 Fixed in GitHub Pull 366
Resolved
Reported by
Bluap (talk) 03:05, 3 May 2011 (UTC)[reply]
Type of bug
Inconvenience:
What happens
If one of the authors in the citation has "et al" after his name (which is standard practice, to indicate many additional authors), citation bot will add all of the secondary authors, but keep "et al" in the original name.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Arcminute_Microkelvin_Imager&curid=1514376&diff=427131812&oldid=422901090
We can't proceed until
Agreement on the best solution
Requested action from maintainer
I suggest that Citation bot does not add additional authors when there is the string "et al" in an existing author field. (Note: I do not believe that it is within Citation bot's remit to change the citation format with multiple authors from an "et al" convention to a full list of the authors' names)


bizarre deletion and addition of content

Status
Cannot reliably reproduce
Reported by
Sailsbystars (talk) 03:22, 10 May 2011 (UTC)[reply]
Type of bug
Deleterious
What happens
Removes and adds content, including talk page comments for no apparent reason
What should happen
"Touching page to update categories"
Relevant diffs/links
[11][12][13]
We can't proceed until
Input from editors
Requested action from maintainer
Fix severe bug. Something has gone seriously wrong here.


Those diffs seem to be the result of near-simultaneous edits by the bot and a human (note the timestamps), which MediaWiki doesn't always handle well. I don't think much can be done about this problem. Ucucha 03:27, 10 May 2011 (UTC)[reply]

Yet every time the bot uses that edit summary it makes the same mistake [14][15], so it's more than just a coincidental edit conflict. I have yet to find an example of that particular edit summary that didn't wind up making a content change.... Sailsbystars (talk) 03:39, 10 May 2011 (UTC)[reply]
The reason for that is that when the bot makes that edit, it normally does not change the article text, and therefore the edit is not recorded. So the only edits that are recorded are those that go wrong. Ucucha 03:48, 10 May 2011 (UTC)[reply]
Thanks for the explanation, that makes sense. Sailsbystars (talk) 03:59, 10 May 2011 (UTC)[reply]
I've updated the edit summary to highlight the likelihood that it needs reverting.  Fixed in GitHub Pull 375 Martin (Smith609 – Talk) 21:54, 11 June 2011 (UTC)[reply]

Speaking of that, this edit does look like a preventable error. Presumably the page size exceeds the bot's buffer. Ucucha 13:46, 10 May 2011 (UTC)[reply]

Any idea how I would fix this? (in PHP)
Thanks, Martin (Smith609 – Talk) 21:54, 11 June 2011 (UTC)[reply]
Expand the memory allotted to the bot? I'm not sure. It did this on two of those list pages; one was 143,889 bytes and got truncated to 1,078; the other went from 79,654 to 64,791. I can't find any other instances where the bot truncated an article, although it did this same thing on larger articles (e.g., Hockey stick controversy, 104,001 bytes). One possibility I can think of is that the number of citation templates in these pages somehow caused the problem. User:Ucucha/List of mammals/Primates currently has an incomplete cite doi; perhaps you can run the bot on that page to check whether it is still doing this. Ucucha 14:25, 14 June 2011 (UTC)[reply]
The bot worked fine on that page. Martin (Smith609 – Talk) 02:55, 19 June 2011 (UTC)[reply]
Perhaps the phase of the moon was better this time. I guess that means you fixed the bug in one way or another. Ucucha 04:28, 19 June 2011 (UTC)[reply]
  • Could you point me to the approval for this task? I certainly do not see any kind of approval for the bot to make edits that "SHOULD PROBABLY BE REVERTED". It seems the bot is simply making null edits, and causing problems when running into edit conflicts? Please disable this task unless you obtain approval for it. –xenotalk 15:58, 7 July 2011 (UTC)[reply]
    • It appears to be a very rare use-case, where the bot detects an edit conflict and so cannot determine if in fact it is making a non-null edit. I'm not sure why the bot doesn't handle edit conflicts by simply re-starting with the newer version of the page (after a delay). That would seem to be more robust. LeadSongDog come howl! 16:28, 7 July 2011 (UTC)[reply]
      • The bot should probably not be making null edits in the first place - and certainly not without approval to do so. The job queue is there for a reason. –xenotalk 16:33, 7 July 2011 (UTC)[reply]
  • No comment on the appropriateness, but if a null-edit ever must be performed, do so with an empty appendtext parameter instead of the page content in text. That way, you have no problems with edit conflicts. Amalthea 16:54, 7 July 2011 (UTC)[reply]
    • ... and I note that it's not MediaWiki at fault here, it seems that edit conflict detection is disabled in the bot. Amalthea 16:57, 7 July 2011 (UTC)[reply]
      • Thanks for pointing this out Amalthea; I've enabled this, which should resolve the issue. Martin (Smith609 – Talk) 22:20, 16 July 2011 (UTC)[reply]

Volume zero

Status
 Fixed in GitHub Pull 348
Resolved
Reported by
David Eppstein (talk) 21:16, 19 May 2011 (UTC)[reply]
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
"Springer Monographs in Mathematics" do not have volume numbers, but Citation bot incorrectly adds zero as the volume number anyway
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Max-plus_algebra&action=historysubmit&diff=429942907&oldid=422191314
We can't proceed until
Bot operator's feedback on what is feasible


Mangled diacritics

Status
Resolved
 Fixed in GitHub Pull 365{{fixedin|347}}
Reported by
 pablo 06:43, 21 May 2011 (UTC)[reply]
Type of bug
Deleterious
What should happen
Citations should be fixed without altering characters such as á é í etc
Relevant diffs/links
link showing what happens
Replication instructions
click on the 'Citations' button
We can't proceed until
None
Requested action from maintainer
Have a look, and fix it!


You probably need to add the u pattern modifier to some preg_replace call. Ucucha 10:59, 21 May 2011 (UTC)[reply]

Who me? pablo 08:03, 8 June 2011 (UTC)[reply]
No, the bot operator—sorry for the confusion. Ucucha 08:14, 8 June 2011 (UTC)[reply]
Done; I needed an mb_convert_encoding. Martin (Smith609 – Talk) 21:39, 11 June 2011 (UTC)[reply]

This is not fixed; see this recent edit. nb - seems to work fine using IE on a windows box, but not using Firefox 3.6.18 on Mac OS 10.6.7 pablo 13:46, 16 June 2011 (UTC)[reply]

Thanks for the pointer; I hope that it works now! Martin (Smith609 – Talk) 02:48, 19 June 2011 (UTC)[reply]
Seems fine now, thanks! pablo 13:54, 19 June 2011 (UTC)[reply]

Cochrane Database Syst Rev converts to title, chapter?

Status
 Fixed in GitHub Pull 361
Resolved
Reported by
RDBrown (talk) 23:53, 14 June 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
Move existing title field to chapter, adds title of "Cochrane Database of Systematic Reviews"
What should happen
Don't move title if journal is populated
Relevant diffs/links
Migraine history
Replication instructions
Rabbie R, Derry S, Moore RA, McQuay HJ (2010). Moore, Maura (ed.). "Ibuprofen with or without an antiemetic for acute migraine headaches in adults". Cochrane Database Syst Rev. 10 (10): CD008039. doi:10.1002/14651858.CD008039.pub2. PMID 20927770.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Kirthi V, Derry S, Moore RA, McQuay HJ (2010). Moore, Maura (ed.). "Aspirin with or without an antiemetic for acute migraine headaches in adults". Cochrane Database Syst Rev. 4 (4): CD008041. doi:10.1002/14651858.CD008041.pub2. PMID 20393963.{{cite journal}}: CS1 maint: multiple names: authors list (link) Derry S, Moore RA, McQuay HJ (2010). Moore, Maura (ed.). "Paracetamol (acetaminophen) with or without an antiemetic for acute migraine headaches in adults". Cochrane Database Syst Rev. 11 (11): CD008040. doi:10.1002/14651858.CD008040.pub2. PMID 21069700.{{cite journal}}: CS1 maint: multiple names: authors list (link)

We can't proceed until
Agreement on the best solution


Actually, doing a conversion to {{cite cochrane}} might be better.

Headbomb {talk / contribs / physics / books} 16:30, 16 June 2011 (UTC)[reply]

I'm trying to capitalize on the "series" information in CrossRef, but this doesn't always mean the same thing. I've implemented your suggestion of not using the data when |journal= is set. Martin (Smith609 – Talk) 01:04, 19 June 2011 (UTC)[reply]

Reference combination errors

Status
Resolved
Reported by
User:CharlesGillingham
Type of bug
Deleterious
What happens
See this edit, and look at the footnotes before and after the edit. The bot removed the body of several named footnotes that it believed were defined elsewhere. They were not, apparently, because errors now appear. Still researching what happened.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Artificial_intelligence&oldid=433822933
We can't proceed until
?
Requested action from maintainer
This is a notification


Apparently this is the same bug as affected Ilium_(novel); bug reports combined. Martin (Smith609 – Talk) 00:13, 19 June 2011 (UTC)[reply]

Initial problem

http://en.wikipedia.org/w/index.php?title=Template%3ACite_doi%2F10.1098.2Frstb.1985.0005&diff=prev&oldid=434588043 - Martin (Smith609 – Talk) 13:49, 16 June 2011 (UTC)[reply]

Resolved

Ilium (novel)

Status
 Fixed in GitHub Pull 360
Resolved
Reported by
Salamurai (talk) 18:26, 16 June 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
Bot finds 2 identical refs and combines them. second ref, however, is in the reflist section (i.e.: {{reflist|refs= ... }} and this causes an error.
What should happen
Bot could check that it's in the reflist section before combining.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Ilium_%28novel%29&action=historysubmit&diff=434609781&oldid=419146386 (see bright red error in the refs)
We can't proceed until
Agreement on the best solution


I've found that this is the same error as reported above with t article Artificial intelligence. - Salamurai (talk) 06:35, 17 June 2011 (UTC)[reply]

Thanks;  Fixed in GitHub Pull 360; see http://en.wikipedia.org/w/index.php?title=User%3ADOI_bot%2FZandbox&action=historysubmit&diff=435016611&oldid=435016579. Martin (Smith609 – Talk) 00:56, 19 June 2011 (UTC)[reply]

Yiddish phonology

Status
 Fixed in GitHub Pull 359
Reported by
Salamurai (talk) 19:48, 16 June 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
Bot combines 2 refs, but earlier instance is in commented-out area, thus breaking refs
What should happen
Bot needs to BOLO for commented-out refs
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Yiddish_phonology&action=historysubmit&diff=434638131&oldid=428268149
We can't proceed until
Agreement on the best solution


Resolved

ref name combining error

Status
 Fixed in GitHub Pull 358
Reported by
Salamurai (talk) 06:22, 17 June 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
The bot combines refs, but renames the ref name to have extra quotemarks -- example: <ref name= "Playfair90"> to be <ref name=" "Playfair90""/>. And not always, see Playfair84 at the diff link. It appears to happen more frequently when there is a space after the name= text. Ref code then reads the " " as the ref name, kicking out an error. AnomieBOT then sweeps up behind and removes the original ref name, leaving <ref name=" "/>, compounding the issue.
What should happen
Can the bot be made to not add the extra quotes? such as, checking for existing quotemarks on the name first?
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Second_Battle_of_El_Alamein&diff=prev&oldid=433960730
We can't proceed until
Agreement on the best solution


I didn't realise that placing spaces between "name = 'Playfair'" was legit. The bot now allows for this behaviour. Martin (Smith609 – Talk) 23:54, 18 June 2011 (UTC)[reply]

Stray periods

Status
 Fixed in GitHub Pull 359
Reported by
Ucucha 23:43, 18 June 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
Clean-up diff. The bot added a stray period after some authors that had only one first name. Also, per Template:Cite doi#Formatting, there should be no space between the initials.
We can't proceed until
Bot operator's feedback on what is feasible


Thanks for fixing. Ucucha 00:03, 19 June 2011 (UTC)[reply]

Resolved

"Broken doi" applied incorrectly

Status
new bug
Reported by
Philcha (talk) 21:52, 19 June 2011 (UTC)[reply]
We can't proceed until
Agreement on the best solution


The bot says DOI 10.1111/j.1096-0031.2009.00255.x at Mollusca is "broken", but it works perfectly. Please fix the bot and rmv the linked "edit" note. --Philcha (talk) 21:52, 19 June 2011 (UTC)[reply]

Might have been due to the line break between the DOI and the }} (changed here). If so, that's a bug in the bot. Ucucha 22:05, 19 June 2011 (UTC)[reply]
I cannot replicate this. Perhaps it was a temporary glitch with Crossref? Unless I hear otherwise I'll mark this as resolved. Martin (Smith609 – Talk) 21:50, 16 July 2011 (UTC)[reply]

Single page ranges

Status
 Fixed in GitHub Pull 382 - and duplicated below
Reported by
Headbomb {talk / contribs / physics / books} 02:29, 20 June 2011 (UTC)[reply]
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
Puts the page range as |pages=S08007–S08007
What should happen
Instead, it should be short-and-sweet for ranges spanning 1 page i.e. |pages=S08007
Also it messes with authors, and it probably shouldn't.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=TOTEM&curid=2601903&diff=435168119&oldid=426988967
We can't proceed until
Agreement on the best solution


Will look at page ranges. I've tried to ensure that the visual output is unchanged as far as authors go. Adding the extra makes it easier to export the citation to an external program. Let me know if I have missed something with the formatting. Martin (Smith609 – Talk) 03:27, 20 June 2011 (UTC)[reply]
Wikiproject medicine contributors may go mad if you switch from the use of |author= even if the displayed result is unchanged and meta data becomes available. Rjwilmsi 10:38, 20 June 2011 (UTC)[reply]

Messes with author... again

Status
new bug
Reported by
Headbomb {talk / contribs / physics / books} 09:41, 20 June 2011 (UTC)[reply]
What happens
Many changes to the author fields
What should happen
No changes to the author fields
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Three_jet_event&curid=2574002&diff=435219353&oldid=426985904
We can't proceed until
Input from editors


I think it's pretty much established by now that the bot should not be editing the author fields for any reason. It's removing et al., cluttering citations with additional parameters, etc... This has been raised several times in the past, and I'm re-raising this again because for some reason these changes were re-enabled. The bot does not have consensus to perform changes related to authors fields, so please stop the bot from making them. Headbomb {talk / contribs / physics / books} 09:41, 20 June 2011 (UTC)[reply]

Can you please clarify how the displayed output is changed? Martin (Smith609 – Talk) 21:03, 20 June 2011 (UTC)[reply]

The et al. is removed, indicating that the named author did it all by himself, when in fact he has unnamed coauthors. The name of a collaboration is given after the author in the case of this particular work, but it is unclear if that is an indication of coauthors, or just affiliation information. Since citation templates have no manual of style, there is no way to determine the meaning of the phrase in parenthesis after the author. Readers must not be expected to understand how Citation bot works, or even that it exists, so I preemptively reject any explanation about how the bot looks it up in some database. The rules for presenting authorship in any such database are not incorporated in the description of the citation templates, and thus are not available to readers. Jc3s5h (talk) 22:12, 20 June 2011 (UTC)[reply]

I have looked at the diff link above, and whilst there are several changes, they mostly follow the same pattern. Ignoring irrelevances such as issue numbers and page ranges, the main problem seems to be centred around a misapprehension of how the |display-authors= parameter works. For example, in the first change, the |author=R. Brandelik et al. (TASSO collaboration) is changed to |author=R. Brandelik (TASSO collaboration) and two parameters are added: |author-separator=,|display-authors=1. This would work as intended if either |author2= or |last2=|first2= were provided, but they aren't. |display-authors= only operates if there are more authors than the figure specified. As in:
{{cite journal|author=Doe, John|author2=Public, Joe|journal=A Journal|title=A Paper|year=2011}}
Doe, John; Public, Joe (2011). "A Paper". A Journal.
{{cite journal|author=Doe, John|author2=Public, Joe|journal=A Journal|title=A Paper|year=2011|display-authors=1}}
Doe, John; et al. (2011). "A Paper". A Journal.
{{cite journal|author=Doe, John|journal=A Journal|title=A Paper|year=2011|display-authors=1}}
Doe, John (2011). "A Paper". A Journal. {{cite journal}}: Invalid |display-authors=1 (help)
I seem to recall that in the past the bot would fill in those missing author parameters. Why is it not doing so here? If it were, the "et al." would be triggered. --Redrose64 (talk) 11:26, 21 June 2011 (UTC)[reply]
I've made a start with a fix here. I am not convinced that "J.D. Smith et al (JADE collaboration)" is a valid value for the author parameter, so am not sure how to handle it. If it is the norm in particle physics, then I can add a special case. If it is confined to a couple of articles then it will be quicker to fix by hand. Please advise. Martin (Smith609 – Talk) 21:44, 16 July 2011 (UTC)[reply]

Reference consolidation error with list-defined references

Status
 Fixed in GitHub Pull 387 by = Rjwilmsi 10:35, 20 June 2011 (UTC)[reply]
Reported by
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
What happens
List-defined references ref condensed incorrectly
We can't proceed until
Input from editors


Please describe what happens and what should happen, so that I can understand the problem. Martin (Smith609 – Talk) 21:27, 16 July 2011 (UTC)[reply]

If you follow the diff given, and then go to the references section, there is a red error message at ref no. 25, which wasn't there before the bot edit. Somehow it's broken how <ref name="Holland1909" /> links. --Redrose64 (talk) 22:06, 16 July 2011 (UTC)[reply]
Gotcha. Thanks.  Fixed in GitHub Pull 387 Martin (Smith609 – Talk) 23:29, 16 July 2011 (UTC)[reply]
Status
Duplicate of fixed bug; verified fixed on 21:23, 16 July 2011 (UTC).
Resolved
Reported by
Yoninah (talk) 16:15, 20 June 2011 (UTC)[reply]
We can't proceed until
Agreement on the best solution


The bot made a bit of a mess of my citations on 2 April. It erased about 50 citations and credited them all to one source. Could you revert what it did, and I'll redo the edits I made today? Thanks, Yoninah (talk) 16:15, 20 June 2011 (UTC)[reply]

OK, I figured out how to undo what the bot did. Yoninah (talk) 16:20, 20 June 2011 (UTC)[reply]

Bot deletes value of |language=

Status
no Invalid
Reported by
LeadSongDog come howl! 15:58, 27 June 2011 (UTC)[reply]
Type of bug
Deleterious: Human-input data is deleted or articles are otherwise significantly affected. Many bot edits require undoing.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Agostino_Bassi&diff=prev&oldid=436517444
We can't proceed until
Agreement on the best solution


The full template prior to the bot edit was as follows:

{{cite journal
|last=Dossena
|first=G
|authorlink=
|year=1954
|month=January
|title=Quello che la medicina deve ad Agostino Bassi
|language=Italian
|trans_title=Debt of medicine to Agostino Bassi
|journal=Rivista d'ostetricia e ginecologia pratica
|volume=36
|issue=1
|pages=43–53
| publisher = | location = | issn =
| pmid = 13168166
| bibcode = | oclc =| id = | url = | language = | format = | accessdate = | laysummary = | laysource = | laydate = | quote =
 }}

The |language= parameter occurs twice, but only the first instance has a value. It is a feature of MediaWiki template expansion that should any named parameter occur more than once, all are ignored except the last one. So, |language=Italian|language= is exactly equivalent to |language= and therefore the rendered appearance before and after the bot edit would be the same. --Redrose64 (talk) 19:17, 27 June 2011 (UTC)[reply]

True, though surely the correct fix would be to remove the blank parameter only. Rjwilmsi 20:41, 27 June 2011 (UTC)[reply]
The bot is programmed not to affect the visible output of the citation unless necessary. This way articles are not damaged for readers. Editors are responsible for previewing their content before saving. Martin (Smith609 – Talk) 21:16, 16 July 2011 (UTC)[reply]

Bad edit

Status
 Fixed in GitHub Pull 383
Reported by
MZMcBride (talk) 05:36, 11 July 2011 (UTC)[reply]
What happens
http://en.wikipedia.org/w/index.php?title=Asperger_syndrome&diff=438861601&oldid=438861291
We can't proceed until
Agreement on the best solution


Messes with pages

Status
new bug
Reported by
Headbomb {talk / contribs / physics / books} 15:34, 11 July 2011 (UTC)[reply]
What happens
The bot trims the page range when it should not.
What should happen
Leave page ranges alone, except for dashes.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Kazimierz_Fajans&diff=prev&oldid=438923847
We can't proceed until
Agreement on the best solution


The documentation for {{Cite book}} contains the example "pages=100–110" while the documentation for {{Citation}} contains the example "pages=153–61". There does not appear to be any standard about how to format such page ranges, and it is wrong for the bot to create a standard where none exists. Jc3s5h (talk) 15:56, 11 July 2011 (UTC)[reply]

It seems all that edit did was reflect the existing page range as shown on Pubmed. Is there some reason to think something more systematic is at work?LeadSongDog come howl! 20:51, 11 July 2011 (UTC)[reply]
I don't know why it did what it did, but there is no standard expressed in the template documentation as to whether the page range is the page range occupied by the article in the journal, or the page range that supports the claim in the article. Therefore it is improper for the bot owner to unilaterally and without broad consensus decide that it means the page range occupied by the article in the journal. Jc3s5h (talk) 21:11, 11 July 2011 (UTC)[reply]
Since the purpose of referencing is to aid verification, it is better to be precise rather than vague when giving a source, so that a reviewer can be directed to the exact position in the source of the claimed fact. We cannot expect such people to read the whole of the source; many academic papers may run into dozens of pages in one journal. Therefore, to my mind, it's the second of these "the page range that supports the claim in the article". --Redrose64 (talk) 22:11, 11 July 2011 (UTC)[reply]
Redrose64, I agree that's what the parameters ought to mean, and I encourage you to edit the documentation of the various and sundry templates to reflect that interpretation. Jc3s5h (talk) 22:15, 11 July 2011 (UTC)[reply]
I think you are both missing the issue here (quite unrelated to the issue you are discussing), which is that the bot changed a full range of the form "100–110" to a partial range of the form "100–10". Those are both valid styles to refer to exactly the same thing. I prefer the former, but agree that the bot shouldn't be changing one to the other. Ucucha 22:17, 11 July 2011 (UTC)[reply]
If you look closely at Template:Citation/doc you will see that the way page ranges are shown is different for journals than from books. In this case, the journal citation used the same page range style as used on Pubmed, which was entirely appropriate. That said, I can think of few details that strike me as less significant. How about getting the substantive fixes done before worrying about this? LeadSongDog come howl! 14:49, 12 July 2011 (UTC)[reply]
The problem is not whether "its trivial or not", the problem is that the bot should not be changing this because it overides human editorial decisions. Headbomb {talk / contribs / physics / books} 14:56, 12 July 2011 (UTC)[reply]
Is the bot ignoring exclusion? Or are you expecting it to not edit at all? I note in the contribs that other edits by the bot using the same notation 381 just before and just after the one it made to Kazimierz Fajans inserted page ranges that were fully elaborated, not abbreviated. It's pretty clearly just following the source. LeadSongDog come howl! 15:28, 12 July 2011 (UTC)[reply]
I don't understand what LeadSongDog's comment is supposed to mean (before the edit conflict).
In the case of journal articles, there are two different ways to use the pages parameter. One is in articles with just a <references/> section, in which case the specific pages supporting the claim should be given. The other case is an article that uses shortened citations, in which case the specific pages that support the claim should be given in the shortened citation and the full citation (which might be a cite xxx or citation template) would give the full page range that the journal article occupies.
Since the bot is not designed to detect whether an article uses shortened citations, and is incapable of detecting whether the shortened citations have been fully implemented or are a "work in progress", the bot should not alter page ranges. Jc3s5h (talk) 15:33, 12 July 2011 (UTC)[reply]
Whether articles use just a "<references/>" or "shortened citations" is completely irrelevant here. The only thing relevant is that citation bot should not change the citation style used by an article. This applies to things like changing "Smith, J." to " Smith J" just as well as changing "pp. 100–110" to "pp. 100–10". Headbomb {talk / contribs / physics / books} 16:22, 12 July 2011 (UTC)[reply]
I was referring to abbreviated page ranges, as Headbomb indicated, and as the original bug report showed. The "shortened citations" bit seems to be a red herring. Again, the bot did not alter the page ranges, it only altered the style of page range representation (in this case from 402–404 to 402–4). Frankly, I'm much more concerned about a biography with a single reference than I am with how many digits are shown in the page range representation of that reference, but that's just me. LeadSongDog come howl! 16:31, 12 July 2011 (UTC)[reply]

LeadSongDog, you stated "It's pretty clearly just following the source." If by that you mean it is using the page range as stated in Pubmed, or some other database, then it's doing the wrong thing, and should leave any page or pages parameter entered by an editor alone. That's both a matter of style, and a matter of substance, because it may not be appropriate to list all the pages that the article occupies. Jc3s5h (talk) 17:37, 12 July 2011 (UTC)[reply]

Jack, it is solely a matter of style, not substance, because the substantial value of the parameter was not changed. The edit did add the cited article's title, which had previously been omitted. Both representations denote the same three pages, as would "402, 403, 404" if someone wanted to use that odd form. It was originally the bot which added that page range as "402–4" over two years ago, and it used the form found on Pubmed. Headbomb revised it here. It is not entirely clear if this citationbot edit was derived from the version Headbomb committed just one minute earlier or based on the one Yobot committed an hour before, but it appears to be based on Headbomb's version. (It might be helpful in cases like this if the bot's edit comment explicitly stated the oldid.) Do you really think it would be a reasonable use of resources to have the bot trawl through edit histories for every parameter change in order to see whether a bot or a human editor inserted the parameter value? Or did you have some other way in mind for it to distinguish? LeadSongDog come howl! 18:53, 12 July 2011 (UTC)[reply]
Maybe the particular edit you pointed out was harmless, but it could be a warning of harmful edits that have already, or will in the future, happen. If the logic that the bot is following is to take a page range from a database and overwrite a page (or range) provided to an editor, then sooner or later it will make harmful edits. So the logic should be simple; if the page or pages parameter is present and non-empty the bot should not edit it. Jc3s5h (talk) 19:42, 12 July 2011 (UTC)[reply]
If we stop every bot from working on the premise that it might get something wrong, there would not be any point in having them, would there? If we have examples of actual problems, they can be addressed. Of course the bot code is available if someone wants to do a code review. LeadSongDog come howl! 00:13, 13 July 2011 (UTC)[reply]
This is not a matter of "right" and "wrong" this is a matter of WP:BOTPOL. Bots should not annoy editors and override perfectly legitimate editorial decisions. Period. Headbomb {talk / contribs / physics / books} 04:13, 13 July 2011 (UTC)[reply]
I see nothing to that effect in the current revision of BOTPOL, perhaps it has changed? It does seem an excellent aspirational goal for all editors, though not always practicable. LeadSongDog come howl! 05:45, 13 July 2011 (UTC)[reply]
See Wikipedia:BOTPOL#Bot requirements, bullets 2/3/4. Headbomb {talk / contribs / physics / books} 06:03, 13 July 2011 (UTC)[reply]

If by that you mean:


  • does not consume resources unnecessarily
  • performs only tasks for which there is consensus
  • carefully adheres to relevant policies and guidelines

That's something of a reach to "not annoy editors". Or am I looking at the wrong set of bullets?

Anyhow, I just quickly reread all the approvals, and the discussion of what to do with the basic citation parameters doesn't seem to have been very explicit. You seems to have spent more attention on the basic questions such as whether bare urls should be replaced. Still, you certainly seem to feel strongly about this, for whatever reason. I'll just drop the stick and move away. LeadSongDog come howl! 13:39, 13 July 2011 (UTC)[reply]

Incomplete removal of second DOI from id breaks formatting of citation

Status
new bug
Reported by
David Eppstein (talk) 00:47, 13 July 2011 (UTC)[reply]
Type of bug
Deleterious: Human-input data is deleted or articles are otherwise significantly affected. Many bot edits require undoing.
What happens
The citation by Lopez and Law in path decomposition has two DOIs (yes really), one of which was formatted as a DOI: wikilink in the id section. Citation bot removed the second DOI but retained the two brackets at the start of the wikilink, causing the entire citation template's formatting to break. (Note, this was back in May, but I only just noticed it recently. So it's possible it's a dup of an already-fixed bug.)
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Path_decomposition&action=historysubmit&diff=429839779&oldid=424822377
We can't proceed until
Agreement on the best solution


Comment That's a really, really bad use of |id=. Headbomb {talk / contribs / physics / books} 18:15, 14 July 2011 (UTC)[reply]

Agreed: in such a case I would add another reference to cover the alternate source. --Redrose64 (talk) 18:59, 14 July 2011 (UTC)[reply]
Regardless of whether you like or dislike the existing formatting, Citation bot should never break the syntax of a citation template to the point that it shows up in the article as garbled plain text. Re-opening because two different problems (the nonstandard id and the garbled template) do not cancel out to make zero problems. —David Eppstein (talk) 21:33, 16 July 2011 (UTC)[reply]
What solution do you propose? Martin (Smith609 – Talk) 21:51, 16 July 2011 (UTC)[reply]
That the bot check that the [ ] and { } in the parameters of the modified citation are all properly balanced, and that it abort making a change to that citation if they are not. —David Eppstein (talk) 01:36, 17 July 2011 (UTC)[reply]
What about citations that are meant to have unbalanced braces? Martin (Smith609 – Talk) 02:03, 17 July 2011 (UTC)[reply]
You mean like ones in <nowiki> tags? If you know how to parse them and ignore them, fine. If not, and the change causes a very small number of valid citation tags with intentional unbalanced braces to be uneditable by the bot and require manual editing instead, is that such a great price to pay for not breaking things? —David Eppstein (talk) 02:10, 17 July 2011 (UTC)[reply]

Unsuccessful attempt to fix page number

Status
 Fixed in GitHub Pull 382
Reported by
Jc3s5h (talk) 01:46, 13 July 2011 (UTC)[reply]
Type of bug
Inconvenience
What happens
a single page number is converted to a malformed range (in diff below, "pages = 642" becomes "pages = 642–642"
What should happen
"pages = 642" should become "page = 642"
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Vanadium_nitrogenase&diff=prev&oldid=439164935
We can't proceed until
Agreement on the best solution


Erroneous 5th author

Status
no Invalid
Reported by
Crowsnest (talk) 07:00, 15 July 2011 (UTC)[reply]
What happens
erroneous adds copy of 4th author as 5th author
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Group_velocity&diff=439576566&oldid=439576451
We can't proceed until
Agreement on the best solution


  • The fourth author was missing from the ref. Instead the 5th author was given as 4th author. Citation bot added the 5th author but did not check it was identical to the 4th. -- Crowsnest (talk) 07:10, 15 July 2011 (UTC)[reply]
    • The fourth author was specified using |first4=S. |last4=Jarabo. The bot doesn't modify human-entered authors. It correctly added a missing |author5=. Martin (Smith609 – Talk) 21:12, 16 July 2011 (UTC)[reply]

Page range and author lists

Status
new bug
Reported by
Colin°Talk 08:23, 17 July 2011 (UTC)[reply]
Type of bug
Deleterious
What happens
Page ranges altered e.g. 2434–6 to 2434–2436. Author lists altered from short et al style to multi-author multi-parameter.
Relevant diffs/links
http://en.wikipedia.org/w/index.php?title=Asperger_syndrome&diff=439733193&oldid=438898702
We can't proceed until
A specific edit to the bot's code is requested below.
Requested action from maintainer
The bot needs to respect existing format styles. The above examples are extremely common on medical articles that use Diberri's tool for generating cite journal refs from PMIDs and the abbreviated forms typical in medical journals. WP:CITEVAR does not allow such style changes without gaining consensus per-article. An article that is altered by this bot will be much harder to maintain by editors using Diberri's tools because, in order to be consistent, they would have to manually fix the page ranges and to tediously enter all the author parameters.