Jump to content

User talk:Citation bot: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
URL removal that duplicates DOI: :::I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as {{para|doi-access|free}} and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. <span style="font-variant:small-caps; whitespace:nowrap;">Headbomb {t · [[Special:Contributions/Headbom
Line 145: Line 145:
:: Perhaps some way to detect medical articles? [[User:AManWithNoPlan|AManWithNoPlan]] ([[User talk:AManWithNoPlan|talk]]) 20:28, 23 March 2020 (UTC)
:: Perhaps some way to detect medical articles? [[User:AManWithNoPlan|AManWithNoPlan]] ([[User talk:AManWithNoPlan|talk]]) 20:28, 23 March 2020 (UTC)
:::I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as {{para|doi-access|free}} and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. &#32;<span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 21:20, 23 March 2020 (UTC)
:::I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as {{para|doi-access|free}} and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. &#32;<span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 21:20, 23 March 2020 (UTC)

=== Samples ===
From [[REM Sleep Behavior Disorder Single-Question Screen]], for all the citations in the article, here is what our readers see:
# [https://en.wikipedia.org/w/index.php?title=REM_Sleep_Behavior_Disorder_Single-Question_Screen&oldid=946844530#References when the URL for non-PMC free full text is included] and
# [https://en.wikipedia.org/w/index.php?title=REM_Sleep_Behavior_Disorder_Single-Question_Screen&diff=prev&oldid=946843806#References when the URL is deleted,] because it duplicates what is in the DOI link.
In the second case, the impression that could be given to readers is that they can't read the full text of the second source. We cannot assume that our readers know that they can click on the DOI link in that one case to get to the free full text. We can't even assume they know what a DOI link is. We can't expect them, in a larger article, to click through to every DOI to see if, by chance, free full text is available. We WANT our readers to be able to access text as often as possible, and to be able to verify text, so by in effect "hiding" free full text from the uninitiated reader is a disservice. {{pb}} Our readers may know that in ALL Wikipedia articles, a blue link in the title means they can read the article. The citation style I use is consistent with the Wikipedia-wide convention: that is, to provide a blue link in the title whenever free full text is available. We do this automatically when a PMC is available, but we have to do it manually when a PMC is not available, but free full text is otherwise available. I have been asking for a long time to stop changing the citation style in articles I edit; I hope not to have to install a deny bot template, because that would eliminate valuable bot edits. Please stop removing URLs to free full text when a PMC is not available, so citations will be consistent. [[User:SandyGeorgia|'''Sandy'''<span style="color: green;">Georgia</span>]] ([[User talk:SandyGeorgia|Talk]]) 22:53, 23 March 2020 (UTC)

Revision as of 22:53, 23 March 2020

You may want to increment {{Archive basics}} to |counter= 21 as User talk:Citation bot/Archive 20 is larger than the recommended 150Kb.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Incorrect change from "url=" to "chapter-url="

Status
new bug
Reported by
Graham87 15:23, 21 December 2019 (UTC)[reply]
What happens
In the Busselton article, the bot changes "url=" to "chapter-url=", which is incorrect in this case because the PDF link goes to the entire book, not just one chapter of it.
Relevant diffs/links
this edit
We can't proceed until
Feedback from maintainers


Manual bypass seems the solution here. Headbomb {t · c · p · b} 15:27, 22 December 2019 (UTC)[reply]
Not making bot edits beyond the capacity of the bot to understand the actual meaning of the content at the link seems to be the answer to me. If we're going to have two different url parameters with different meanings and one of them is chosen as the correct one by a human editor, why should the bot be second-guessing that? —David Eppstein (talk) 18:52, 22 December 2019 (UTC)[reply]
Because in 99%+ of cases, humans are wrong and use url instead of chapter url. Headbomb {t · c · p · b} 20:45, 22 December 2019 (UTC)[reply]
This is directly counter to the philosophy according to which, several years ago, the |url= parameter was changed from being a catch-all parameter that would by default bind to the tightest title in the template, and instead became split into several parameters that each had a specific meaning. If I want to use a parameter with its correct meaning, and the bot refuses to let me, that seems like the very definition of a bug to me. —David Eppstein (talk) 01:11, 31 December 2019 (UTC)[reply]
Just encountered this again at Modern Jazz Quartet. The bot should have some code that helps it figure out that this, added by InternetArchiveBot, is most definitely not a chapter URL. Graham87 04:24, 12 March 2020 (UTC)[reply]
Here's another one. If Citation bot is too stupid to recognize that an archive.org url like this, without any extra page-number complications, is going to be a link to the whole book, it is too stupid to be making these changes at all. url= without chapter-url= is a perfectly valid combination of parameters and should not need special bot-exclusion code to prevent it from being broken by marauding bots. —David Eppstein (talk) 06:36, 20 March 2020 (UTC)[reply]

Suggest modifying Zotero timeout

Status
Annoying from time to time
Reported by
Martin (Smith609 – Talk) 08:56, 3 March 2020 (UTC)[reply]
What happens
Zotero allows 15s before reporting a timeout.

This is maybe fine when running the bot from a URL, but when using the "citations" button it led me to give up and abort the run. It seems to me that 15000ms is a very long time to wait, particularly if there are multiple Zotero calls on a page: would 150ms still be sufficient?

> Using Zotero translation server to retrieve details from URLs.
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://sp.sepmonline.org/content/sepsp088/1/SEC6.abstract
  ! Operation timed out after 15000 milliseconds with 0 bytes received   For URL: http://www.paleoportal.org/kiosk/sample_site/fossil_gallery_109_images.html
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://ichnology.ku.edu/invertebrate_traces/tfimages/zoophycos.html
We can't proceed until
Feedback from maintainers


In normal circumstances I'd say that anything above 1000 ms is crazy slow. However I have no idea what's the median response time from our Zotero server. Do you know? Nemo 20:07, 3 March 2020 (UTC)[reply]
This is the total time from initiating the connection until data is received and the connection is closed. There is a separate timeout for just connecting. The more urls on the page, the shorter the timeout. AManWithNoPlan (talk) 20:25, 3 March 2020 (UTC)[reply]
   if ($url_count < 5) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 15);
   } elseif ($url_count < 25) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 10);
   } else {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 5);
   }
If we reduced that to, say, 3, 2 and 1 respectively, would we be able to tell from the logs or something whether the success rate (however defined) increases? Nemo 20:56, 3 March 2020 (UTC)[reply]
I had something similar for User:Bibcode Bot, but I had increasing timeouts (5/10/15 seconds) for the ADSABS database before failure. But this was a bot doing its on thing, without anyone waiting after it. For what's essentially a communal tool, I'd say 10 seconds total wait time for a single url should be more than enough. And if multiple distinct Zotero calls fail in succession, maybe skip Zotero for the next 5 minutes so we're not constantly querying a dead connection during a server hiccup or something. Headbomb {t · c · p · b} 22:23, 3 March 2020 (UTC)[reply]
We do skip after enough fails, but that is per run and not global. AManWithNoPlan (talk) 22:53, 3 March 2020 (UTC)[reply]
I don't see this warning right now. AManWithNoPlan (talk) 22:54, 3 March 2020 (UTC)[reply]
 if (!$is_a_man_with_no_plan) $this->expand_templates_from_identifier('url',     $our_templates);
long-term it would be good to take advantage of the bulk API and submit all urls at once AManWithNoPlan (talk) 00:57, 4 March 2020 (UTC)[reply]
True for all APIs. Headbomb {t · c · p · b} 19:17, 16 March 2020 (UTC)[reply]
Already true for the slow ones that allow it (other than zotero). AManWithNoPlan (talk) 12:20, 17 March 2020 (UTC)[reply]

Discussion at Village Pump

Of possible interest: Wikipedia:Village_pump_(technical)#=url_and_=archiveurl_do_not_match -- GreenC 14:18, 18 March 2020 (UTC)[reply]

Status
new bug
Reported by
David Eppstein (talk) 16:52, 18 March 2020 (UTC)[reply]
What happens
Special:Diff/946139951
What should happen
Moving the "centennial edition" part to an edition field is probably too much intelligence to expect of the bot, but the title link for Alan Turing: The Enigma should be either moved to a title-link field or left in place, not just dropped on the floor.
We can't proceed until
Feedback from maintainers


Partial wikilinks should not be used (according to the styles), and are 99% of the time invalid (ie. they link to IBM in the title instead of the actual thing, for example) AManWithNoPlan (talk) 12:41, 19 March 2020 (UTC)[reply]

Under heavy load

Status
 Fixed for now
Reported by
Joseywales1961 (talk) 23:09, 22 March 2020 (UTC)[reply]
What happens
bot hangs while trying to fix refs (2 bare refs that I then fixed manually) on page Pickaninny and four or five other pages I attempted to use it on today
We can't proceed until
Feedback from maintainers


remove website and synonyms from cite arxiv

Status
feature request
Reported by
Headbomb {t · c · p · b} 03:33, 23 March 2020 (UTC)[reply]
What happens
[1] (after changing cite web to cite arxiv manually)
What should happen
[2]
We can't proceed until
Feedback from maintainers


WP:CITEVAR violation using citation bot

 Fixed - copy from my talk page

When using citation bot: please be more careful about not changing instances of {citation} to {cite book} (especially where the source is not a book) where the former is the established usage, as done here at Puget Sound faults, and other places. (Haven't I mentioned this before?) Nor should the first author's first/last be concatenated with preceding line, as it makes it harder to scan the citation for accuracy. Your attention to this would be appreciated. ♦ J. Johnson (JJ) (talk) 20:46, 12 March 2020 (UTC)[reply]

I see the problem. It has a journal set which is invalid for citation, so it has to be changed to cite book. BUT, the journal is set to a comment which is a strange edge case. AManWithNoPlan (talk) 21:00, 12 March 2020 (UTC)[reply]
https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)[reply]

URL removal that duplicates DOI

User:SandyGeorgia has a complaint, about this in medical articles. The idea being that URLs that duplicate DOIs should be left if free. AManWithNoPlan (talk) 20:05, 23 March 2020 (UTC)[reply]

Thoughts on it staying if the url is free parameter is set? AManWithNoPlan (talk) 20:13, 23 March 2020 (UTC)[reply]
Perhaps some way to detect medical articles? AManWithNoPlan (talk) 20:28, 23 March 2020 (UTC)[reply]
I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as |doi-access=free and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. Headbomb {t · c · p · b} 21:20, 23 March 2020 (UTC)[reply]

Samples

From REM Sleep Behavior Disorder Single-Question Screen, for all the citations in the article, here is what our readers see:

  1. when the URL for non-PMC free full text is included and
  2. when the URL is deleted, because it duplicates what is in the DOI link.

In the second case, the impression that could be given to readers is that they can't read the full text of the second source. We cannot assume that our readers know that they can click on the DOI link in that one case to get to the free full text. We can't even assume they know what a DOI link is. We can't expect them, in a larger article, to click through to every DOI to see if, by chance, free full text is available. We WANT our readers to be able to access text as often as possible, and to be able to verify text, so by in effect "hiding" free full text from the uninitiated reader is a disservice.

Our readers may know that in ALL Wikipedia articles, a blue link in the title means they can read the article. The citation style I use is consistent with the Wikipedia-wide convention: that is, to provide a blue link in the title whenever free full text is available. We do this automatically when a PMC is available, but we have to do it manually when a PMC is not available, but free full text is otherwise available. I have been asking for a long time to stop changing the citation style in articles I edit; I hope not to have to install a deny bot template, because that would eliminate valuable bot edits. Please stop removing URLs to free full text when a PMC is not available, so citations will be consistent. SandyGeorgia (Talk) 22:53, 23 March 2020 (UTC)[reply]