Jump to content

User talk:Citation bot

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Krenair (talk | contribs) at 02:00, 2 January 2020 (Bot down: think I've got it working properly now). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

You may want to increment {{Archive basics}} to |counter= 20 as User talk:Citation bot/Archive 19 is larger than the recommended 150Kb.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Handles list expansion

Headbomb will provide a list of Handle providers that we will add to our constants files AManWithNoPlan (talk) 19:03, 16 October 2019 (UTC)🤔[reply]

Time to call in Leeroy Jenkins to extract the handles. AManWithNoPlan (talk) 22:18, 31 October 2019 (UTC)[reply]
Feel free to work on User:Headbomb/Sandbox and see which prefix resolves or not. Headbomb {t · c · p · b} 23:45, 31 October 2019 (UTC)[reply]

Fails to decapitalize

Status
feature request
Reported by
Headbomb {t · c · p · b} 01:40, 7 November 2019 (UTC)[reply]
What should happen
[1]
We can't proceed until
Feedback from maintainers


I had to whack on the bot to make this happen. It should have decapitalized FRONTIERS IN IMMUNOLOGY and BIOGERONTOLOGY on its own (adding the '(journal)' pipe was me, i don't expect the bot to do that). Headbomb {t · c · p · b} 01:40, 7 November 2019 (UTC)[reply]

I think you posted the wrong edit link. But it sounds like you want us to fix fully capitalized journal names like we do titles that are all caps. Is that correct? AManWithNoPlan (talk) 15:43, 7 November 2019 (UTC)[reply]
Yes that's the wrong link. However, we already decapitalize all caps journals usually, see e.g. [2]. Headbomb {t · c · p · b} 18:45, 7 November 2019 (UTC)[reply]
fixing links currently does not work via the gadget since the bot is not logged in to query the database. It should be possible to use curl to get the same information. AManWithNoPlan (talk) 00:04, 8 November 2019 (UTC)[reply]
Status
new bug
Reported by
🌿 SashiRolls t · c 20:09, 24 November 2019 (UTC)[reply]
What happens
access to /viewport is zapped.
What should happen
access to /viewport should not be zapped.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Tiffany_Midge&oldid=926539338
We can't proceed until
Feedback from maintainers


I suspect this is probably a feature rather than a bug, but I don't understand why this should be a feature... seems very counter-intuitive. The difference appears to be that the deleted url led directly to the full-text whereas the OCLC field does not lead to /viewport. (not sure where to click to get there either) 🌿 SashiRolls t · c 20:09, 24 November 2019 (UTC)[reply]

"Preview this book" right below the image. AManWithNoPlan (talk) 21:22, 24 November 2019 (UTC)[reply]
"deleted url led directly to the full-text " that is simply untrue. It leads to a limited google books preview. AManWithNoPlan (talk) 21:23, 24 November 2019 (UTC)[reply]
OK, I understand a bit better now. Clicking on preview this book, and then clicking on google preview is what I missed... because I thought it was a worldcat digitization. I only scrolled through the first few fifteen-twenty pages, so did not realize it was partial. I have to say it's not very user friendly to have a link to (partial) full-text labelled 1113896227 instead of just directly linked from the reference title, but then I suppose we are expecting wiki-readers to be sufficiently geeky to know that 1113896227 will lead them to more info whereas the secret code 978-1-496-21803-2 leads nowhere useful (like the bluelinks to ISBN and OCLC). Thanks for looking into it and explaining the odd logic. :) 🌿 SashiRolls t · c 22:09, 24 November 2019 (UTC)[reply]
One of the objections to including links to Google Books is that what different readers will see varies unpredictably, and may change. These WorldCat digitized previews are stable, which is a major plus. They don't allow linking to the specific page, which we've come to do, but I think the stability can only be a plus. ISBNs and OCLC numbers lead to full bibliographic info, but the reader is still stymied if they can't get access to the book, which is quite common. (Interlibrary loan is very limited for readers in most places, and we can hardly expect readers to always buy a book, or even to be able to do so in whatever country they live in.) So where's the downside of also adding a link that guarantees they can scroll to the relevant page? In particular, it's hardly a duplication at all, especially since this OCLC link is largely unknown; I had no idea it existed. Yngvadottir (talk) 22:44, 24 November 2019 (UTC)[reply]
These WorldCat digitized previews are stable, which is a major plus. Not true. The OCLC viewport link is just a link to a Google Book preview. Google books did the scanning. Worldcat simply builds a little box and links to the google scan in that box. This is the same mechanism that other websites (unrelated to google maps) use to display a little box with google maps content. The problems with google books preview that you describe above are still there. My vote is to always remove worldcat links from |url= when there is a matching |oclc= identifier.
Trappist the monk (talk) 23:16, 24 November 2019 (UTC)[reply]
This is not a vote. Wikipedia already says to not link to google books, unless it is a complete and free preview. Can someone find that policy and link it here. These are worse than google book links. They point to some random page instead of a front page or a specifically chosen page. AManWithNoPlan (talk) 17:12, 25 November 2019 (UTC)[reply]
I agree that it would be good for someone (perhaps you?) to dig up this policy that you say you've seen, as it would directly contradict the Citing Sources guideline, which I'm more familiar with. (NB: it says quite clearly that the OCLC, ISBN, etc. can coexist with the link in the citation as of this writing). 🌿 SashiRolls t · c 19:59, 25 November 2019 (UTC)[reply]
It seems part of a wider problem of what exactly should be the algorithm for handling these identifiers and links which may not be as perfect as they sometimes make out to be ... should there be a policy of using only the restrictive and perfect identifiers available as per here albeit at the result denying access to those who have no such access to the source which is available elsewhere ... this seems it line with the url blue linking approach at Wikipedia:Bots/Noticeboard#IABot blue linking to Internet archive books and Wikipedia:Bots/Noticeboard#User:GreenC bot and edit filters where the GreenC approach is to not to use the ol= identifier and use the URL. I can see there may be reasons for the approach but I would like to see evidencing of clearing guidelines rather than people's opinions. It is not unknown for me to goto a library or purchase a resource so oclc has its uses. Thankyou.Djm-leighpark (talk) 20:17, 25 November 2019 (UTC)[reply]
This is unfortunate because archive.org is 100% viewable for free (with 1-time registration) - which is not the case for Google where you only get a partial view. With archive.org you can link to any page within a book for a free 2-page preview (no registration), which is not the case with Google which can only preview certain pages. However, understood archive.org does not have every book that Google might. In my experience Google Book scans come and go, they are not a library and take books (or previews) offline for commercial reasons so no guarantee those scans will be accessible in the future. Also Wikipedia and archive.org are non-profits with close overlap of goals, while Google is a commercial book seller with different goals, we will favor non-profits over commercial given the choice. -- GreenC 20:45, 25 November 2019 (UTC)[reply]
In some ways I'd prefer to use "open library" rather than archive.org as archive.org is at least dual purpose, one is for storing/OCR'ing and provisioning either unrestricted free or by limited library lending; the other for archival of web pages. There perhaps may be no clashing between these BOTs but having had two articles where it has broken syntax'ed on me I'm not confident everything on the same page and perhaps guidelines should be updated so the old algorithms can be written and checked against them? (I may have strayed from the original bug) Djm-leighpark (talk) 20:59, 25 November 2019 (UTC)[reply]
I think you're a bit confused: openlibrary.org is a collection of catalog records to aid in the discovery of books; archive.org is the actual digital library. The archival of web pages is at web.archive.org. Nemo 21:34, 25 November 2019 (UTC)[reply]
I am sorry I am somewhat confused and ask stupid questions. It is in my nature and training.Djm-leighpark (talk) 21:43, 25 November 2019 (UTC)[reply]
Don't be sorry! It's essential to surface such misunderstandings, otherwise we're just going to talk past each other. A lot of people are confused by archive.org vs. web.archive.org etc., almost as many as wikimedia.org vs. mediawiki.org. ;-) Nemo 22:06, 25 November 2019 (UTC)[reply]

For me personally, the links to worldcat.org are completely useless because they don't load any preview at all unless I allow a series of cookies and third-party resources. Links to the splash page leading to a full text (for instance on biodiversitylibrary.org) are often useful, but I've yet to encounter a case where worldcat.org is the best link available for a given content. Nemo 21:34, 25 November 2019 (UTC)[reply]

  • Ahrons, E. L. (1954). L. L. Asher (ed.). Locomotive and train working in the latter part of the nineteenth century". Vol. six. W Heffer & Sons Ltd. OCLC 606019549. OL 21457769M. {{cite book}}: Invalid |ref=harv (help) ? Djm-leighpark (talk) 22:26, 25 November 2019 (UTC)[reply]

look at code coverage

AManWithNoPlan (talk) 22:31, 15 December 2019 (UTC)[reply]

go looking for bugs

https://en.wikipedia.org/wiki/User:AnomieBOT/Nobots_Hall_of_Shame/0 AManWithNoPlan (talk) 12:29, 24 December 2019 (UTC)[reply]

ZooKeys issues, still not fixed

Status
new bug
Reported by
Headbomb {t · c · p · b} 11:53, 28 November 2019 (UTC)[reply]
What happens
[3]
What should happen
[4]
We can't proceed until
Feedback from maintainers


We only change it if it’s set to one. The problem is that the existing data looks reasonable with 12. AManWithNoPlan (talk) 11:59, 28 November 2019 (UTC)[reply]

Zookeys will always have issues that match the bold part of 10.3897/zookeys.772.24410. Headbomb {t · c · p · b} 12:47, 28 November 2019 (UTC)[reply]

Mass DOI finder by CrossRef

Converting unstructured references is much more fun using https://doi.crossref.org/SimpleTextQuery ! I don't know you, but I get tired copy-and-pasting from articles to a search engine and back. For days I failed to get anything out of it, until I realised that I must paste my list of references into LibreOffice, click the "numbered list" button, and paste the numbered list into the tool. If you have no numbers, or if you add them manually like a human would do, it's not going to do anything.

Although there is no shortage of citation farms and messy citation sections, I wondered if there's a faster way to find the low hanging fruit. So I made a file with 25k lines from the latest English Wikipedia dump, which look like they might be titles of some work by some very simplistic grepping. If you copy up to 1000 lines into https://doi.crossref.org/SimpleTextQuery , you get a decent amount of DOIs and then you can go look for those titles in articles. I did the biggest chunks in the first 2k lines so far. Nemo 21:32, 2 December 2019 (UTC)[reply]

I pasted some examples at User:Nemo bis/Missing cite journal. Nemo 13:00, 3 December 2019 (UTC)[reply]
Status
new bug
Reported by
Trappist the monk (talk) 14:23, 4 December 2019 (UTC)[reply]
What happens
|author1=[[Robert Jay Charlson|Charlson]] |first1=R. J.|last1=[[Robert Jay Charlson|Charlson]] |first1=R. J. ... |author1-link=Robert Jay Charlson |author1=Charlson
What should happen
first:
|author1=[[Robert Jay Charlson|Charlson]]|last1=[[Robert Jay Charlson|Charlson]]
then:
|last1=[[Robert Jay Charlson|Charlson]] |first1=R. J.|last1=Charlson |first1=R. J. |author-link1=Robert Jay Charlson
or, do nothing because |last1= and |author1= are equal aliases
We can't proceed until
Feedback from maintainers


Status
new bug
Reported by
Headbomb {t · c · p · b} 21:43, 4 December 2019 (UTC)[reply]
What should happen
[5]
We can't proceed until
Feedback from maintainers


Mobile web

Is it possible for the bot to replace links to mobile sites such as https://m.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ to https://www.washingtontimes.com/news/2017/may/2/peter-newsham-confirmed-as-chief-of-dc-police/ (see Special:Diff/930509786&oldid=930509645? Jonatan Svensson Glad (talk) 00:19, 13 December 2019 (UTC)[reply]

I think so, but that might be a better task for a different bot. AManWithNoPlan (talk) 11:54, 13 December 2019 (UTC)[reply]

Bbc.com

Status
new bug
Reported by
Jonatan Svensson Glad (talk) 01:10, 15 December 2019 (UTC)[reply]
What should happen
Remove |publisher=Bbc.com when adding |work=BBC News
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=National_Congress_%28Sudan%29&diff=prev&oldid=930800719
We can't proceed until
Feedback from maintainers


bbc titles

Also, is it possible to see if the bot can fetch a "clean" title if it ends with - BBC News as in |title=Omar al-Bashir: How Sudan's military strongmen stayed in power - BBC News Jonatan Svensson Glad (talk) 01:11, 15 December 2019 (UTC)[reply]

More DOIs for IEEE citations

According to a query, IEEE URLs remain among the most intractable for Citation bot: there are some 2-3000 which resist metadata fixes, largely because they don't have a DOI and the usual technical limitations make it hard to find one. Matching over the document/AR number in the CrossRef dump, I believe I can make a list of URLs linked in our articles and their corresponding DOI. Then, it would need to be added by a bot, probably with a regex replacement: is there some bot or AWB operator here interested in doing it? Nemo 10:12, 16 December 2019 (UTC)[reply]

online books

Status
new bug
Reported by
Topo122 (talk) 19:44, 19 December 2019 (UTC)[reply]
What happens
Changes template type
We can't proceed until
Feedback from maintainers


In Fabian S. Woodley Citation bot changed this:

To this:

It obscures the fact that the reference is to the on-line version of the Oxford Dictionary of National Biography. And the Oxford Dictionary of National Biography is not a journal. Topo122 (talk) 19:44, 19 December 2019 (UTC)[reply]

Should be a cite dictionary / cite book. Headbomb {t · c · p · b} 20:22, 19 December 2019 (UTC)[reply]
I think whether it's online or offline is immaterial. At the end of the day, it's still a dictionary/book (and happens to be available online which was the source accessed). I do agree that it shouldn't be cite journal, but similarly it shouldn't be cite web. --Izno (talk) 22:11, 19 December 2019 (UTC)[reply]

{{cite web}} is the wrong template; better is {{cite ODNB}} (which itself uses {{cite encyclopedia}}):

{{cite ODNB |last1=Courtney |first1=W. P. |last2=Hinings |first2=Jessica |title=Woodley, George |doi=10.1093/ref:odnb/29929}}
Courtney, W. P.; Hinings, Jessica. "Woodley, George". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/ref:odnb/29929. (Subscription or UK public library membership required.)

Trappist the monk (talk) 22:33, 19 December 2019 (UTC)[reply]

Given https://api.crossref.org/works/10.1093/ref:odnb/29929, can all items with a DOI of type "reference-entry" use {{cite encyclopedia}}, with whatever is the "container-title" going into |encyclopedia=? The manual says it shouldn't be used for just any book with multiple authors; on the other hand, not all reference works are books. Nemo 05:15, 20 December 2019 (UTC)[reply]

Didn't know about {{cite ODNB}} - very useful - I'll use it in future! Topo122 (talk) 11:42, 21 December 2019 (UTC)[reply]

Incorrect change from "url=" to "chapter-url="

Status
new bug
Reported by
Graham87 15:23, 21 December 2019 (UTC)[reply]
What happens
In the Busselton article, the bot changes "url=" to "chapter-url=", which is incorrect in this case because the PDF link goes to the entire book, not just one chapter of it.
Relevant diffs/links
this edit
We can't proceed until
Feedback from maintainers


Manual bypass seems the solution here. Headbomb {t · c · p · b} 15:27, 22 December 2019 (UTC)[reply]
Not making bot edits beyond the capacity of the bot to understand the actual meaning of the content at the link seems to be the answer to me. If we're going to have two different url parameters with different meanings and one of them is chosen as the correct one by a human editor, why should the bot be second-guessing that? —David Eppstein (talk) 18:52, 22 December 2019 (UTC)[reply]
Because in 99%+ of cases, humans are wrong and use url instead of chapter url. Headbomb {t · c · p · b} 20:45, 22 December 2019 (UTC)[reply]
This is directly counter to the philosophy according to which, several years ago, the |url= parameter was changed from being a catch-all parameter that would by default bind to the tightest title in the template, and instead became split into several parameters that each had a specific meaning. If I want to use a parameter with its correct meaning, and the bot refuses to let me, that seems like the very definition of a bug to me. —David Eppstein (talk) 01:11, 31 December 2019 (UTC)[reply]

Bot down

It fails on every page. Both gadget and bot itself. Hiccup? Bigger issue? Headbomb {t · c · p · b} 23:36, 26 December 2019 (UTC)[reply]

Same here. When I click on the Citations button it gives me "Error: Citations request failed". Trying to use the bot directly redirects to a 503 page. --Ihaveacatonmydesk (talk) 16:51, 27 December 2019 (UTC)[reply]
it appears that over Christmas those with power are on vacation. AManWithNoPlan (talk) 20:34, 27 December 2019 (UTC)[reply]
The development version is still live (https://tools.wmflabs.org/citations-dev/), but it doesn't seem to do everything that https://tools.wmflabs.org/citations does, as I was trying to use it to expand abbreviated journal titles, which it didn't. Seppi333 (Insert ) 01:39, 28 December 2019 (UTC)[reply]
I wouldn’t use that version. AManWithNoPlan (talk) 01:44, 28 December 2019 (UTC)[reply]
@Seppi333: also the bot never expanded abbreviated journals. Not on its own at least. Headbomb {t · c · p · b} 04:14, 28 December 2019 (UTC)[reply]
Then how were you doing it? Seppi333 (Insert ) 04:14, 28 December 2019 (UTC)[reply]
Deleting abbreviations manually and letting the bot fill them. Then taking care of what the bot didn't do. Headbomb {t · c · p · b} 04:38, 28 December 2019 (UTC)[reply]

That seems to be it. Sad. BernardoSulzbach (talk) 19:04, 28 December 2019 (UTC)[reply]

@DBarratt (WMF), Kaldari, Mattsenate, Maximilianklein, and Smith609: anything that can be done here? You're listed as contact people on the error message/toolabs page. Headbomb {t · c · p · b} 12:48, 30 December 2019 (UTC)[reply]
I think that maintane_files.php corrupted the files. I have removed that tool from the source tree so it cannot happen again. AManWithNoPlan (talk) 13:12, 30 December 2019 (UTC)[reply]
@AManWithNoPlan: I'm getting a 503 message whenever I try to run the bot; I don't think that fixed it, at least on my end. --Nathan2055talk - contribs 21:38, 30 December 2019 (UTC)[reply]
@AManWithNoPlan: Yup, still not fixed when I tried it today. Tgeorgescu (talk) 10:06, 31 December 2019 (UTC)[reply]

Please don't ping me, I am not an operator. I can’t reboot it. AManWithNoPlan (talk) 11:52, 31 December 2019 (UTC)[reply]

Seems like this incident shows it's time to extend reboot privileges to a few other people. --Ihaveacatonmydesk (talk) 17:38, 31 December 2019 (UTC)[reply]
its rather important to have this running--Ozzie10aaaa (talk) 17:56, 1 January 2020 (UTC)[reply]

I asked Krenair to restart the service, so it should be working now, however, there is a syntax error somewhere causing the tool to kill itself. Jonatan Svensson Glad (talk)

I've tweaked it a bit, try now. If it doesn't work, or it breaks itself again, we should probably wait for a real maintainer of the tool to sort things out. --Krenair (talkcontribs) 01:03, 2 January 2020 (UTC)[reply]
After some more fiddling around I believe it's working without any more local hacks from me. It seems the tool on toolforge had a broken file from an automatic update mechanism that is being removed, FYI maintainers: I've reset the repository in public_html from ef1ea17a4d1d2bc0adbcce6032a768f91b53ec40 to 8d755d36a9e5e023c690c47be7bf10bd5422f00 to drop the automatic local commits to constants/capitalization.php. --Krenair (talkcontribs)

presentation of handles loses useful information

Someone has been running this bot over Queensland content, e.g [6] and it is stripping out the name of the website (which denies the reader the knowledge that it comes from a reliable source -- The State Library of Queensland) in favour of making the rather ugly handle visible to reader. I don't have a problem with the URL being replaced with a handle but could we make the visible text of the handle the name of the handle naming authority (if available) or website/publisher (alternatively) State Library of Queensland or simply retain the name of the website/publisher where provided). Thanks Kerry (talk) 07:49, 30 December 2019 (UTC)[reply]

Actually, it looks like there are too many primary sources and not enough secondary sources and these primary sources are missing more important info like author, work & publisher. Is the library that holds the records even important?  — Chris Capoccia 💬 11:40, 30 December 2019 (UTC)[reply]

Incorrect PMC added

Here the bot added PMC 3435945 to the existing citation for PMID 19741352, the PMC is for a different paper. The PMC may be for a reprint of the cited paper but is in a different journal (also different year, volume, pages) so should not be added. What validation is the bot doing to determine that a PMC (that presumably has been found from a keyword search of PMC database) is for the correct paper? Thanks Rjwilmsi 15:53, 31 December 2019 (UTC)[reply]

I have reported the error to the database. AManWithNoPlan (talk) 17:11, 31 December 2019 (UTC)[reply]
Interesting, I can't see any data issue on the pubmed side (PMID 19741352 and PMC 3435945) - what am I missing? Thanks Rjwilmsi 18:12, 31 December 2019 (UTC)[reply]
it’s in the DOI to open source resolver. We do have lots of checks, but when the title and other things match we get fooled. AManWithNoPlan (talk) 19:50, 31 December 2019 (UTC)[reply]