User talk:Citation bot: Difference between revisions

Content deleted Content added

Inline

Revision as of 22:53, 23 March 2020

You may want to increment {{Archive basics}} to |counter= 21 as User talk:Citation bot/Archive 20 is larger than the recommended 150Kb.

Archives

Archive 0 (Early bug reports)
Archive 1 (May 2008 – Jun 2011)
Archive 2 (Jun 2011 – Nov 2015)
Archive 3 (Nov 2015 – Jul 2016)
Archive 4 (Jul 2016 – Oct 2016)
Archive 5 (Oct 2016 – Sep 2017)
Archive 6 (Sep 2017 – Oct 2017)
Archive 7 (Oct 2017 – Jul 2018)
Archive 8 (Jul 2018 – Aug 2018)
Archive 9 (Aug 2018 – Aug 2018)
Archive 10 (Sep 2018 – Oct 2018)
Archive 11 (Oct 2018 – Nov 2018)
Archive 12 (Nov 2018 – Jan 2019)
Archive 13 (Jan 2019 – Feb 2019)
Archive 14 (Feb 2019 – Mar 2019)
Archive 15 (Mar 2019 – Jun 2019)
Archive 16 (Jun 2019 – Jul 2019)
Archive 17 (Jul 2019 – Aug 2019)
Archive 18 (Aug 2019 – Oct 2019)
Archive 19 (Oct 2019 – present)

This page has archives. Sections older than 40000 days may be automatically archived by .

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Please click here to report an error.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Incorrect change from "url=" to "chapter-url="

Status: new bug
Reported by: Graham87 15:23, 21 December 2019 (UTC)[reply]

What happens: In the Busselton article, the bot changes "url=" to "chapter-url=", which is incorrect in this case because the PDF link goes to the entire book, not just one chapter of it.
Relevant diffs/links: this edit
We can't proceed until: Feedback from maintainers

Manual bypass seems the solution here. Headbomb {t · c · p · b} 15:27, 22 December 2019 (UTC)[reply]

Not making bot edits beyond the capacity of the bot to understand the actual meaning of the content at the link seems to be the answer to me. If we're going to have two different url parameters with different meanings and one of them is chosen as the correct one by a human editor, why should the bot be second-guessing that? —David Eppstein (talk) 18:52, 22 December 2019 (UTC)[reply]

Because in 99%+ of cases, humans are wrong and use url instead of chapter url. Headbomb {t · c · p · b} 20:45, 22 December 2019 (UTC)[reply]

This is directly counter to the philosophy according to which, several years ago, the |url= parameter was changed from being a catch-all parameter that would by default bind to the tightest title in the template, and instead became split into several parameters that each had a specific meaning. If I want to use a parameter with its correct meaning, and the bot refuses to let me, that seems like the very definition of a bug to me. —David Eppstein (talk) 01:11, 31 December 2019 (UTC)[reply]

Just encountered this again at Modern Jazz Quartet. The bot should have some code that helps it figure out that this, added by InternetArchiveBot, is most definitely not a chapter URL. Graham87 04:24, 12 March 2020 (UTC)[reply]

Here's another one. If Citation bot is too stupid to recognize that an archive.org url like this, without any extra page-number complications, is going to be a link to the whole book, it is too stupid to be making these changes at all. url= without chapter-url= is a perfectly valid combination of parameters and should not need special bot-exclusion code to prevent it from being broken by marauding bots. —David Eppstein (talk) 06:36, 20 March 2020 (UTC)[reply]

Suggest modifying Zotero timeout

Status: Annoying from time to time
Reported by: Martin (Smith609 – Talk) 08:56, 3 March 2020 (UTC)[reply]

What happens: Zotero allows 15s before reporting a timeout.

This is maybe fine when running the bot from a URL, but when using the "citations" button it led me to give up and abort the run. It seems to me that 15000ms is a very long time to wait, particularly if there are multiple Zotero calls on a page: would 150ms still be sufficient?

> Using Zotero translation server to retrieve details from URLs.
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://sp.sepmonline.org/content/sepsp088/1/SEC6.abstract
  ! Operation timed out after 15000 milliseconds with 0 bytes received   For URL: http://www.paleoportal.org/kiosk/sample_site/fossil_gallery_109_images.html
  ! Operation timed out after 15001 milliseconds with 0 bytes received   For URL: http://ichnology.ku.edu/invertebrate_traces/tfimages/zoophycos.html

We can't proceed until: Feedback from maintainers

In normal circumstances I'd say that anything above 1000 ms is crazy slow. However I have no idea what's the median response time from our Zotero server. Do you know? Nemo 20:07, 3 March 2020 (UTC)[reply]

This is the total time from initiating the connection until data is received and the connection is closed. There is a separate timeout for just connecting. The more urls on the page, the shorter the timeout. AManWithNoPlan (talk) 20:25, 3 March 2020 (UTC)[reply]

   if ($url_count < 5) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 15);
   } elseif ($url_count < 25) {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 10);
   } else {
     curl_setopt($ch_zotero, CURLOPT_TIMEOUT, 5);
   }

If we reduced that to, say, 3, 2 and 1 respectively, would we be able to tell from the logs or something whether the success rate (however defined) increases? Nemo 20:56, 3 March 2020 (UTC)[reply]

I had something similar for User:Bibcode Bot, but I had increasing timeouts (5/10/15 seconds) for the ADSABS database before failure. But this was a bot doing its on thing, without anyone waiting after it. For what's essentially a communal tool, I'd say 10 seconds total wait time for a single url should be more than enough. And if multiple distinct Zotero calls fail in succession, maybe skip Zotero for the next 5 minutes so we're not constantly querying a dead connection during a server hiccup or something. Headbomb {t · c · p · b} 22:23, 3 March 2020 (UTC)[reply]

We do skip after enough fails, but that is per run and not global. AManWithNoPlan (talk) 22:53, 3 March 2020 (UTC)[reply]

I don't see this warning right now. AManWithNoPlan (talk) 22:54, 3 March 2020 (UTC)[reply]

 if (!$is_a_man_with_no_plan) $this->expand_templates_from_identifier('url',     $our_templates);

long-term it would be good to take advantage of the bulk API and submit all urls at once AManWithNoPlan (talk) 00:57, 4 March 2020 (UTC)[reply]

True for all APIs. Headbomb {t · c · p · b} 19:17, 16 March 2020 (UTC)[reply]

Already true for the slow ones that allow it (other than zotero). AManWithNoPlan (talk) 12:20, 17 March 2020 (UTC)[reply]

Discussion at Village Pump

Of possible interest: Wikipedia:Village_pump_(technical)#=url_and_=archiveurl_do_not_match -- GreenC 14:18, 18 March 2020 (UTC)[reply]

Removes valid partial title link

Status: new bug
Reported by: David Eppstein (talk) 16:52, 18 March 2020 (UTC)[reply]

What happens: Special:Diff/946139951
What should happen: Moving the "centennial edition" part to an edition field is probably too much intelligence to expect of the bot, but the title link for Alan Turing: The Enigma should be either moved to a title-link field or left in place, not just dropped on the floor.
We can't proceed until: Feedback from maintainers

Partial wikilinks should not be used (according to the styles), and are 99% of the time invalid (ie. they link to IBM in the title instead of the actual thing, for example) AManWithNoPlan (talk) 12:41, 19 March 2020 (UTC)[reply]

Under heavy load

Status: Fixed for now
Reported by: Joseywales1961 (talk) 23:09, 22 March 2020 (UTC)[reply]

What happens: bot hangs while trying to fix refs (2 bare refs that I then fixed manually) on page Pickaninny and four or five other pages I attempted to use it on today
We can't proceed until: Feedback from maintainers

remove website and synonyms from cite arxiv

Status: feature request
Reported by: Headbomb {t · c · p · b} 03:33, 23 March 2020 (UTC)[reply]

What happens: [1] (after changing cite web to cite arxiv manually)
What should happen: [2]
We can't proceed until: Feedback from maintainers

WP:CITEVAR violation using citation bot

Fixed - copy from my talk page

When using citation bot: please be more careful about not changing instances of {citation} to {cite book} (especially where the source is not a book) where the former is the established usage, as done here at Puget Sound faults, and other places. (Haven't I mentioned this before?) Nor should the first author's first/last be concatenated with preceding line, as it makes it harder to scan the citation for accuracy. Your attention to this would be appreciated. ♦ J. Johnson (JJ) (talk) 20:46, 12 March 2020 (UTC)[reply]

I see the problem. It has a journal set which is invalid for citation, so it has to be changed to cite book. BUT, the journal is set to a comment which is a strange edge case. AManWithNoPlan (talk) 21:00, 12 March 2020 (UTC)[reply]

https://github.com/ms609/citation-bot/pull/2727 AManWithNoPlan (talk) 21:41, 12 March 2020 (UTC)[reply]

URL removal that duplicates DOI

User:SandyGeorgia has a complaint, about this in medical articles. The idea being that URLs that duplicate DOIs should be left if free. AManWithNoPlan (talk) 20:05, 23 March 2020 (UTC)[reply]

Thoughts on it staying if the url is free parameter is set? AManWithNoPlan (talk) 20:13, 23 March 2020 (UTC)[reply]

Perhaps some way to detect medical articles? AManWithNoPlan (talk) 20:28, 23 March 2020 (UTC)[reply]

I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as |doi-access=free and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. Headbomb {t · c · p · b} 21:20, 23 March 2020 (UTC)[reply]

Samples

From REM Sleep Behavior Disorder Single-Question Screen, for all the citations in the article, here is what our readers see:

when the URL for non-PMC free full text is included and
when the URL is deleted, because it duplicates what is in the DOI link.

In the second case, the impression that could be given to readers is that they can't read the full text of the second source. We cannot assume that our readers know that they can click on the DOI link in that one case to get to the free full text. We can't even assume they know what a DOI link is. We can't expect them, in a larger article, to click through to every DOI to see if, by chance, free full text is available. We WANT our readers to be able to access text as often as possible, and to be able to verify text, so by in effect "hiding" free full text from the uninitiated reader is a disservice.

Our readers may know that in ALL Wikipedia articles, a blue link in the title means they can read the article. The citation style I use is consistent with the Wikipedia-wide convention: that is, to provide a blue link in the title whenever free full text is available. We do this automatically when a PMC is available, but we have to do it manually when a PMC is not available, but free full text is otherwise available. I have been asking for a long time to stop changing the citation style in articles I edit; I hope not to have to install a deny bot template, because that would eliminate valuable bot edits. Please stop removing URLs to free full text when a PMC is not available, so citations will be consistent. SandyGeorgia (Talk) 22:53, 23 March 2020 (UTC)[reply]

@@ Line 145: / Line 145: @@
 :: Perhaps some way to detect medical articles? [[User:AManWithNoPlan|AManWithNoPlan]] ([[User talk:AManWithNoPlan|talk]]) 20:28, 23 March 2020 (UTC)
 :::I don't personally see the point of keeping duplicate URL/DOIs in hard code, but ideally those would be marked as {{para|doi-access|free}} and the template should use that to automatically create links like it does for PMCs. And since this isn't currently possibly, there might be a case to keep them for now. &#32;<span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 21:20, 23 March 2020 (UTC)
+=== Samples ===
+From [[REM Sleep Behavior Disorder Single-Question Screen]], for all the citations in the article, here is what our readers see:
+# [https://en.wikipedia.org/w/index.php?title=REM_Sleep_Behavior_Disorder_Single-Question_Screen&oldid=946844530#References when the URL for non-PMC free full text is included] and
+# [https://en.wikipedia.org/w/index.php?title=REM_Sleep_Behavior_Disorder_Single-Question_Screen&diff=prev&oldid=946843806#References when the URL is deleted,] because it duplicates what is in the DOI link.
+In the second case, the impression that could be given to readers is that they can't read the full text of the second source. We cannot assume that our readers know that they can click on the DOI link in that one case to get to the free full text.  We can't even assume they know what a DOI link is. We can't expect them, in a larger article, to click through to every DOI to see if, by chance, free full text is available. We WANT our readers to be able to access text as often as possible, and to be able to verify text, so by in effect "hiding" free full text from the uninitiated reader is a disservice. {{pb}} Our readers may know that in ALL Wikipedia articles, a blue link in the title means they can read the article.  The citation style I use is consistent with the Wikipedia-wide convention: that is, to provide a blue link in the title whenever free full text is available.  We do this automatically when a PMC is available, but we have to do it manually when a PMC is not available, but free full text is otherwise available.  I have been asking for a long time to stop changing the citation style in articles I edit; I hope not to have to install a deny bot template, because that would eliminate valuable bot edits.  Please stop removing URLs to free full text when a PMC is not available, so citations will be consistent.  [[User:SandyGeorgia|'''Sandy'''<span style="color: green;">Georgia</span>]] ([[User talk:SandyGeorgia|Talk]])  22:53, 23 March 2020 (UTC)