Jump to content

User talk:Citation bot

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Ereunetes (talk | contribs) at 23:37, 25 September 2023 (Moving Jstor and Worldcat URLs to parameters: Reply). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

You may want to increment {{Archive basics}} to |counter= 37 as User talk:Citation bot/Archive 36 is larger than the recommended 150Kb.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least 15 minutes and then complain here.

Submit a Bug Report

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.


Expand non-templated refs

Would it be possible to expand from non-templated reference <ref>[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5553785/ Bar]</ref>, as long as |title= would be exactly the same (Bar) which already exists for the URL specified as if the bot would try to expand the bare URL (as long as there is no other content in the ref)? Jonatan Svensson Glad (talk) 17:16, 24 July 2023 (UTC)[reply]

Example here, I had to remove the brackets and the already provided title prior to running the bot. The outcome provided the exact same title as was already present prior to me doing the removal, causing a lot of manual labor in order to get the bot to attempt to expand the citation. Jonatan Svensson Glad (talk) 17:19, 24 July 2023 (UTC)[reply]
How close should the titles have to be? Also, it seems that from my experience, the title is often some mix of the title and journal and authors. AManWithNoPlan (talk) 20:08, 14 August 2023 (UTC)[reply]
Well, a first start could be exact "only-title" match inside square brackets (with only a preceding period/dot inside or outside the brackets being the difference). To later build upon with more possibilities... Jonatan Svensson Glad (talk) 21:01, 14 August 2023 (UTC)[reply]

Support for Who's Who

Status
new bug
Reported by
Jonatan Svensson Glad (talk) 01:06, 29 July 2023 (UTC)[reply]
What should happen
Implement support to expand from https://doi.org/10.1093/ww/9780199540884.013.U192476 to {{Who's Who}}
Example: https://en.wikipedia.org/w/index.php?title=Friern_Hospital&diff=prev&oldid=1167644213
We can't proceed until
Feedback from maintainers


Alternatively, deny all edits on 10.1093/ww/... doi's. Jonatan Svensson Glad (talk) 01:06, 29 July 2023 (UTC)[reply]

Or perhaps the entire 10.1093-prefix of doi's since we don't have support for {{cite ODNB}} either (example). Jonatan Svensson Glad (talk) 21:11, 29 July 2023 (UTC)[reply]

Journal-to-conference conversion fails to add title and editors of book, leaving paper title formatted as book title

Status
 Fixed - might have to run bot more than once, and editors are not in CrossRef meta-data, so we won't get those
Reported by
David Eppstein (talk) 01:57, 2 September 2023 (UTC)[reply]
What happens
Special:Diff/1173367139
What should happen
Special:Diff/1173378925
We can't proceed until
Feedback from maintainers


Cosmetic edits

I frequently see this bot making cosmetic edits like this one. Could it be programmed to alert editors when they're about to make a cosmetic edit to discourage it? {{u|Sdkb}}talk 17:03, 18 September 2023 (UTC)[reply]

I know from experience with large complex kitchen-sink bots it can be difficult prevent cosmetic edits, it should be seen a best effort. "Frequently" would have to be quantified maybe there are some specific types of edits that could be optimized. -- GreenC 17:27, 18 September 2023 (UTC)[reply]
I'll always take incremental improvements. {{u|Sdkb}}talk 17:30, 18 September 2023 (UTC)[reply]

Moving Jstor and Worldcat URLs to parameters

From discussions (1, 2, 3) on stopping useless cruft – for example this useless blank archive of a Jstor article – from semi-automated mass archiving, a number of editors have noted their support for a bot to parse Jstor and Worldcat URLs (eg https://www.jstor.org/stable/24432812) for their respective |jstor=24432812 and |oclc= parameters where relevant and purge URLs, archive URLs, and archive metadata for CS1 templates.

Is this something that can be done with citation bot? I will note that I'm not saying to purge all URLs – they can be useful if the full text is separately hosted elsewhere – just URLs and archives thereof (almost always useless blank pages) that are duplicative of the generated parameter URLs. Tagging GreenC. Ifly6 (talk) 06:19, 22 September 2023 (UTC)[reply]

The bot got blocked for doing this (although the person who lead the charge on this themselves eventually got banned). The main arguement was that the users of wikipedia are only capable of clicking on title-links, and numbers after the reference as above their IQ level. Although I would argue that having these as title links is misleading since they they almost never lead to the source, but just a page listing the source. AManWithNoPlan (talk) 13:09, 22 September 2023 (UTC)[reply]
That policy feels like insanity. Is it possible to determine whether the Jstor link leads to a full source and the URL (metadata, archive URL, and archive metadata) only if it does not lead to a full source? Worldcat is easier because it never(?) leads thereto. Ifly6 (talk) 14:11, 22 September 2023 (UTC)[reply]
I feel there's a case to remove links that will never host the full text, like PMID, OCLC, etc... because they mislead the reader into thinking there's a full text available at the end. But that would require an RFC. Headbomb {t · c · p · b} 03:34, 23 September 2023 (UTC)[reply]
Is it really the case that we cannot do anything to change this (to me at least) absurdist combination where the following series of events keep occurring:
  • People use Citoid which places Jstor links into {{cite journal}} |url=
  • Citation bot comes around and extracts the Jstor ID etc but doesn't remove the URL
  • Some NPC hits ARCHIVE EVERYTHING with the IA Bot check box (eg IA Bot) and now we have a massive pile of archive URL cruft (nb the check box does not actually archive anything)
  • After this rigmarole an editor can now see the result, which is:
    • A main URL that doesn't give you full text
    • A duplicated parameter which renders an identical URL link (|jstor=24432812)
    • An archive URL which is a literally blank page
    • A mark up reference which is now 70 per cent longer than it needs to be to do the exact same thing
Ifly6 (talk) 15:17, 25 September 2023 (UTC)[reply]
The bot used to do this until the argument was made that: our users were too stupid to figure out non-title links, and yet so smart that they needed links to scientific journals, since wikipedia was too simple for them. AManWithNoPlan (talk) 16:26, 25 September 2023 (UTC)[reply]

Is there really nothing we can do on this without an RFC? Ifly6 (talk) 17:13, 25 September 2023 (UTC)[reply]

Getting blocked twice for the same thing is probably an existential risk.
I think Headbomb makes a good point, removing title-links that don't contain full content and that can be replaced with non-title-links. Sometimes JSTOR has the full content sometimes not, sometimes freely accessible (pre-1923), sometimes not. As for archive URLs, this will depend what is cited, if the content is available in the archive URL. It's context sensitive. I would be careful with an RfC, they can be counter-productive with complex matters. An RfC might codify a minority opinion that bots should not be used at all due to "context sensitive" and the "community" will take care of it, which dooms the whole thing to fantasy land due the reality of the scale.
It's possible a bot (this one or another) could start on JSTOR, determine content availability, url-status, and edit accordingly. It might also check archive URLs for possible problems. This is going to be a slow process, and it might run into bot blockers at JSTOR, rate limiting, which further complicates. If true that would leave the "blind" edit option of simply removing all JSTOR links from the title-link as the only viable method, unless someone has another idea how to determine content availability. -- GreenC 20:01, 25 September 2023 (UTC)[reply]
Some people have deeper access to JSTOR resources than others, depending on where they are. Surely when a JSTOR resource is cited, no-one is seriously suggesting that only open-access ones may be given? Is anyone suggesting that we deprecate ISBNs because <shudder> some readers might have to buy the actual book? Or have I completely missed the point? --𝕁𝕄𝔽 (talk) 22:57, 25 September 2023 (UTC)[reply]
Nobody is saying that Jstor should not be cited. The dispute here is whether a link to the Jstor page should be included in the URL parameter. For me this emerges from the really pointless practice of adding the "archive" version of Jstor links so you can get the glory of gazing upon a blank page. Removing the |url= entry would prevent "archive" links from being added. It is a dispute between whether a reference should look like this:
{{Cite journal |last=Steel |first=Catherine |date=2014 |title=The Roman senate and the post-Sullan "res publica" |journal=Historia: Zeitschrift für Alte Geschichte |volume=63 |issue=3 |pages=323–339 |doi=10.25162/historia-2014-0018 |jstor=24432812 |s2cid=151289863 |issn=0018-2311 }}
Or, by almost inevitable accretion through inaction, like this:
{{Cite journal |last=Steel |first=Catherine |date=2014 |title=The Roman senate and the post-Sullan "res publica" |journal=Historia: Zeitschrift für Alte Geschichte |volume=63 |issue=3 |pages=323–339 |doi=10.25162/historia-2014-0018 |jstor=24432812 |s2cid=151289863 |issn=0018-2311 |url=https://www.jstor.org/stable/24432812 |access-date=26 May 2022 |archive-date=26 May 2022 |archive-url=https://web.archive.org/web/20220526152815/https://www.jstor.org/stable/24432812 |url-status=live }}
The portions at the end after |url= entirely duplicate existing links in the citation and regardless add nothing for the unprivileged reader while clogging up the mark up and making it difficult to do the edit part of "editor". Even if I have Ivy League library access and be able to read all full texts through proxies (eg Penn Libraries), that doesn't mean that linking the proxy page whole (like https://www-jstor-org.wikipedialibrary.idm.oclc.org/) does any good for readers without Penn or Wikipedia library privileges. Ifly6 (talk) 23:37, 25 September 2023 (UTC)[reply]
The number of Wikipedians who potentially have access to JSTOR sources that are hidden by paywalls may be larger than you think. "Veteran" Wikipedians (I believe the cut-off is 500 life-time edits) can avail themselves of access to JSTOR (and many ohter sources barred to the hoi polloi) via the Wikipedia library. So I think for these relatively "privileged" people giving a link to a page that contains a doi is till useful. I have no problem doing it, also for sources like Cambridge U.P and the like. Ereunetes (talk) 23:37, 25 September 2023 (UTC)[reply]