User talk:Citation bot: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
ClueBot III (talk | contribs)
m Archiving 1 discussion to User talk:Citation bot/Archive 34. (BOT)
Line 82: Line 82:
<!-- Discussion starts below this line -->
<!-- Discussion starts below this line -->
That might be hard, let me think about it. [[User:AManWithNoPlan|AManWithNoPlan]] ([[User talk:AManWithNoPlan|talk]]) 20:05, 8 November 2022 (UTC)
That might be hard, let me think about it. [[User:AManWithNoPlan|AManWithNoPlan]] ([[User talk:AManWithNoPlan|talk]]) 20:05, 8 November 2022 (UTC)

== Remove idm.oclc.org proxy URLs ==

{{bot bug
| status = {{fixed}} - will remove if DOI present
| reported by = [[User:Nemo_bis|Nemo]] 16:05, 26 January 2023 (UTC)
| what happens = [https://web-a-ebscohost-com.wikipedialibrary.idm.oclc.org/ehost/pdfviewer/pdfviewer?vid=10&sid=5368485a-1caf-4c9a-94d5-11f836ad591a%40sessionmgr4008 An idm.oclc.org proxy URL] for [[doi:10.2307/144831]] remains in a citation template.
<!-- and/or: --> | what should happen = Proxy URLs should be removed.
| link showing what happens = [[special:diff/1092664388]], [[special:search/insource:"idm.oclc.org"]]
| how to replicate the bug = <!-- If not obvious from the description or the link -->
}}
<!-- Discussion starts below this line -->


== Convert unstructured citations with meaningless proxy URLs ==
== Convert unstructured citations with meaningless proxy URLs ==

Revision as of 07:32, 29 January 2023

You may want to increment {{Archive basics}} to |counter= 35 as User talk:Citation bot/Archive 34 is larger than the recommended 150Kb.

Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least an hour.

Submit a Bug Report

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.


Only partially fixes malformed book-title-in-publisher citation

Status
new bug
Reported by
David Eppstein (talk) 18:55, 6 November 2022 (UTC)[reply]
What happens
In Special:Diff/1120339669, the bot found a citation to a paper in a conference proceedings that had title= the paper's title, and publisher= the proceedings title. The bot correctly moved the paper title to chapter=, added a form of the book's title to the title= parameter, and added correct series= and volume= parameters. So far, all good, and better than I would expect from GIGO. The only thing lacking was that the old publisher= was left in place instead of replacing it with the correct publisher of the proceedings. Would it be possible to fix the publisher, also, in such cases?
We can't proceed until
Feedback from maintainers


That might be hard, let me think about it. AManWithNoPlan (talk) 20:05, 8 November 2022 (UTC)[reply]

Convert unstructured citations with meaningless proxy URLs

Status
new bug
Reported by
Nemo 17:13, 26 January 2023 (UTC)[reply]
What happens
Sometimes references contain proxy URLs which are meaningless, in the sense that they don't contain any useful identifier that could be used for link recovery, so the bot doesn't yet know how to handle them. The reference may use templates with meaningless data such as a title "Shibboleth Authentication Request", or be unstructured.
What should happen
Any available information should be used to retrieve the correct identifier, and a structured citation generated from said identifier, throwing away all the garbage input. It might be possible to achieve this by screen-scraping the meaningless URL's target, or by searching the unstructured citation on Internet Archive Scholar (any result could be verified by searching its title, author, year etc. in the original reference to make sure they all match).
Relevant diffs/links
special:diff/1135750657, special:diff/1135747368
We can't proceed until
Feedback from maintainers


Bot makes cosmetic parameter-name-changing edits

Status
new bug
Reported by
Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:37, 26 January 2023 (UTC)[reply]
What happens
The bot changes work= to newspaper= without making any other changes.
What should happen
The bot should avoid making cosmetic parameter-name changes unless it's also making one or more visible changes in the same edit.
Relevant diffs/links
[1]
We can't proceed until
Feedback from maintainers


This doesn't seem harmful to me per se, but it does add a revision to the page history that doesn't actually change anything. BhamBoi (talk) 06:37, 27 January 2023 (UTC)[reply]

Another example of this behavior in this edit. Also don't understand why my username is mentioned in the edit summary. GoingBatty (talk) 06:46, 28 January 2023 (UTC)[reply]

@GoingBatty:, Whoop whoop pull up asked the bot to run on all links found on your user page. He She did the same with me a while back. It's a very weird use of the bot. Headbomb {t · c · p · b} 10:55, 28 January 2023 (UTC)[reply]
@Headbomb: That's she to you. Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:15, 28 January 2023 (UTC)[reply]
My bad, fixed. Headbomb {t · c · p · b} 23:40, 28 January 2023 (UTC)[reply]

Bot adds publication dates to undated material

Status
new bug
Reported by
Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 21:17, 27 January 2023 (UTC)[reply]
What happens
The bot's been adding and populating a date= parameter for sources with no listed publication date. There is no indication as to where the bot gets these dates from and no reason to think that they're accurate.
What should happen
The bot shouldn't add a publication date for sources that don't have one listed.
Relevant diffs/links
[2], [3]
We can't proceed until
Feedback from maintainers


These dates are stated in the web pages' HTML. You can check with Ctrl-U or other method to view source in your browser. Nemo 21:29, 27 January 2023 (UTC)[reply]

The first one is coming from one of the items in the HTML source (Ctrl + U in Firefox):

<meta property="og:updated_time" content="2019-06-17T16:47:08-04:00" />
<meta property="article:published_time" content="2018-07-17T11:12:23-04:00" />
<meta property="article:modified_time" content="2019-06-17T16:47:08-04:00" />
<meta name="dcterms.date" content="2018-07-17T11:12-04:00" />

The second one has similar:

<meta property="article:published_time" content="2017-01-25T14:57:52-05:00" />
<meta property="article:modified_time" content="2022-05-04T10:25:57-04:00" />

(The bot should probably prefer the modified_time/updated_time if it is the source responsible, and if it's getting it from Citoid or other ext service maybe an upstream notification would be valuable.)

This metadata is deliberately in that location for the purpose of bots and other systems. Izno (talk) 21:31, 27 January 2023 (UTC)[reply]

Hmmm. Well, this is interesting to me. Chiming in here as the person who originally added the cites to these articles. The dates that the Bot is adding to the cites would appear to be incorrect in that they are not published on the page with the source material. Also, the date that the Bot is finding would appear to be the date that the material was published onto the web but it might not be the actual date the material was written or the date that the material was published in print. In the case of the Archipedia material on the Ramsdell, that information seems to have originally been published in print in 2012. In any case, is a researcher/WP-editor expected or supposed to always to look up the html dates if material is undated on the page? Shearonink (talk) 16:57, 28 January 2023 (UTC)[reply]
I usually check the date in the HTML if it's not stated, but one can be forgiven for not doing so. The date in {{cite web}} is usually the date of the web page itself. If the date of original publication of the work carried by the web page has some significance, you can instead use {{cite publication}} or other cite template with the date of the work, indicating that the URL is just one representation.
For the sake of WP:RS, I'd expect editors to know whether they're citing a website or some publication of which the website provides a copy, and ideally they'd use citation templates accordingly, but such details can be addressed if/when confusion arises. Nemo 17:21, 28 January 2023 (UTC)[reply]
I try to be SO scrupulous and careful when citing whatever reference... Does that "Control-U" thingy work with all laptops? (Yay yet another parameter to remember when info or a webpage "appears" to be undated...) I'd never heard about being able to see the date in the html before. Is it something that only works with PCs or Macs/whatever?... Shearonink (talk) 17:42, 28 January 2023 (UTC)[reply]
On Windows in Firefox: Ctrl + U is how Firefox does it. It should work in other browsers but the specific key combo may be different. A second way: if you right-click on a page, also provides "View page source". The third way is to open console, which is F12 or also right click and select inspect.
Other browsers and platforms may have a slightly different way to access the page source. Izno (talk) 18:40, 28 January 2023 (UTC)[reply]
@Izno: Chrome uses Ctrl-U for that purpose too (testing with Chrome 109 in Ubuntu 22.04). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 23:11, 28 January 2023 (UTC)[reply]
There is no requirement to hunt down information in the page source, it is simply another way to get the date usually since indeed many pages don't have a displayed date (but of course they all have a publication date). I would suggest leaving the dates if Citation bot adds one, so long as you can verify at least in the page source that the date didn't spontaneously poof into thin air. Izno (talk) 18:47, 28 January 2023 (UTC)[reply]