User talk:Citation bot

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search


Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Please click here to report an error.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you know what code needs written and can write it.


Discussion: Help talk:Citation Style 1#Links to citation bot, revisited[edit]

Please comment there / support / object. Headbomb {t · c · p · b} 17:59, 28 August 2018 (UTC)

Any thoughts on this and the discussion. AManWithNoPlan (talk) 15:47, 25 October 2018 (UTC)

Discussion: Wish list[edit]

Please comment there / support / object. AManWithNoPlan (talk) 22:42, 17 November 2018 (UTC)

API: Silent/Verbose mode for category[edit]

Add a 'silent' mode. This would simplify the output to simply

--------------------------------------------------------------------------
[12:13:02] Processing page '[[2018 FFA Cup preliminary rounds]]' – [[edit]] – [[history]]
# No changes required.

when there is no changes made and

--------------------------------------------------------------------------
[12:13:02] Processing page '[[2018 FFA Cup preliminary rounds]]' – [[edit]] – [[history]]
# Updating the page ([[diff]]).

when there is a change made. This could probably made 'default' for categories, with &silent=0 to disable it. Or alternatively, &verbose=1 to enable verbose logs. Headbomb {t · c · p · b} 12:25, 21 August 2018 (UTC)

  • difficult to fix: pages that take a while to process will cause an HTTP disconect. AManWithNoPlan (talk) 13:13, 31 October 2018 (UTC)
    • @AManWithNoPlan: not sure what's that got to do with a simplified output in general? Headbomb {t · c · p · b} 13:36, 31 October 2018 (UTC)
      • perhaps output dots as the bot runs. let me think about it. AManWithNoPlan (talk) 13:50, 31 October 2018 (UTC)

API: New feature request, run from links on page[edit]

Let's have something like

  • https://tools.wmflabs.org/citations/list.php?linksonpage=User:Headbomb/Sandbox5

This would be super useful. We could be build lists of pages with crappy citations with AWB's database scanner or with clever insource:// search (e.g. pages with raw GoogleBooks links, pages with raw DOI links, ...), then put the list of pages to be edited somewhere (e.g. User:Headbomb/Sandbox5), then tell the bot to run against those pages (follow redirects if they exist). Headbomb {t · c · p · b} 14:45, 22 August 2018 (UTC)

@Smith609: since you seem to be the one to ask about API features, how doable is this? Headbomb {t · c · p · b} 11:11, 24 August 2018 (UTC)
Does the new "run on multiple pages separated by pipes" functionality address this request? Martin (Smith609 – Talk) 07:46, 25 August 2018 (UTC)
@Smith609: not really. Those list would have to manually be built and fed manually every time. It's OK for a one-time list, but the idea is that you could embed have a one-click way of running the bot on a list of links. Book:Canada would be a prime example (or cleanup-centric lists, like WP:JCW/J30 and fix a crap ton of capitalization mistakes in one click). If you could have something like https://tools.wmflabs.org/citations/list.php?linksonpage=Book:Canada, that would find all links on the page (likely direct links for simplicity) and run the bot on those pages, that would be great.

That is if you have [[Foobar|Barfoo]] somewhere on the page, get Foobar (follow redirects if there are any), and run the bot on that. Repeat for all other links it finds. Headbomb {t · c · p · b} 22:30, 26 August 2018 (UTC)

This is still something that would be incredibly useful. Headbomb {t · c · p · b} 15:02, 30 November 2018 (UTC)

API: add &via= option (also what does &edit= do?)[edit]

In a call like https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&user=Headbomb&page=Steve_Bieda, does edit=toolbar do anything? Because I'd like to have some ways to tell the bot that it was triggered via {{Draft article}} or citation expander, or similar. We might want to rename the parameter to allow something like

  • https://tools.wmflabs.org/citations/process_page.php?via=User%3AHeadbomb%2Fcitation.js&user=Headbomb&page=Steve_Bieda
  • https://tools.wmflabs.org/citations/process_page.php?via=the+%5B%5BWikipedia%3ACitation+expander%7Ccitation+expander%5D%5D&user=Headbomb&page=Steve_Bieda
  • https://tools.wmflabs.org/citations/process_page.php?via=%5B%5BTemplate%3ADraft_article%5D%5D&user=Headbomb&page=Steve_Bieda.

This way we could give a summary like

Headbomb {t · c · p · b} 03:43, 26 August 2018 (UTC)

I wonder what the audience of this additional message would be? To most users, what is important is the content and motivation of an edit, rather than the circumstances in which an editor came to make it. If I have a clear understanding of the motivation for this change, I'll be able to consider the best way to implement it. Martin (Smith609 – Talk) 08:56, 27 August 2018 (UTC)
The goal is mostly to have a way to see where Citation bot is used from. How many of those edits were triggered by the web interface? How many were from user scripts and from which userscript, or how many from templates and which templates (and do any need updating)? How many were done via the Citation Expander gadget? It's not necessarily to have 'official' stats (it would be nice though), but knowing where the bot is used from is nice, and could let us give help to newbies that run into issues with the bot. Headbomb {t · c · p · b} 10:32, 27 August 2018 (UTC)
For example, [1] was most likely triggered from {{Draft article}}, present on Draft:Lil ginger ale (we sadly can't feed who used the Template from the template because we don't have a {{CURRENTUSERNAME}} magicword/variable), but knowing it was triggered from the template means it has a fairly high chance of being used by a newbie, and was probably triggered by one of these people. So that lets us (or at least me) customize feedback to people. If I see someone doing something weird/unusual with the bot from {{Draft article}} vs Web Interface vs Gadget vs User Scripts, well you more or less have a continuum of likely noob vs likely noob/intermediate vs likely intermediate vs likely advanced user dealing with the bot. And you'd have an idea of who could have triggered the bot in that scenario. Headbomb {t · c · p · b} 10:45, 27 August 2018 (UTC)

Request: Shove "additional information" stuff after the pipe in edit summaries[edit]

It would probably make more sense to shove "additional information" stuff after the pipe

or

if &via= and category mentions are implemented. Headbomb {t · c · p · b} 04:10, 26 August 2018 (UTC)

Request: Better Citoid like capabilities[edit]

Status
new bug
Reported by
(tJosve05a (c) 21:10, 5 November 2018 (UTC)
What happens
Citation bot edit
What should happen
Citoid/Zotero edit
We can't proceed until
Feedback from maintainers


Do not automatically add Citeseerx[edit]

Status
new bug
Reported by
David Eppstein (talk) 16:52, 7 November 2018 (UTC)
What happens
Citeseerx links automatically added in violation of WP:COPYLINK and WP:ELNEVER
What should happen
These links can sometimes be ok, but they are often a violation of publisher copyright, so they can only be added if citeseer traces their provenance back to an author copy or a publisher-licensed copy. This needs to be checked by hand. Citation bot should never add such links automatically. There is currently a similar thread about Zenodo about WP:ANI likely to lead to a topic ban from modifying citations for the user incautiously adding such links. Do we want such a ban to be given to Citation bot? The edit is shown as "user activated" but is listed as being made by the bot and there is no responsibility assigned to a specific user for this bad edit.
Relevant diffs/links
Special:Diff/867705073
We can't proceed until
Feedback from maintainers


Users are always responsible for the edits of the bot, since they are the ones that asked the bot to make the edit in the first place, so nothing is automatically added. The best way to deal with (the very small number of) copyvios on CiteSeerX is to contact them to take down the offending file (and possibly put a comment in the citeseerx parameter such as |citeseerx=<!--Copyvio: 10.1.1.whatever/foobar-->, although the CiteSeerX page contains more than just the file and the metadata is gives is useful).Headbomb {t · c · p · b} 16:59, 7 November 2018 (UTC)
The number of copyvios is not small, because citeseerx copies all sorts of copies of papers — often copies made available for some course by someone else – that are neither author copies nor licensed from the publisher. They may be fair use for a course but that doesn't make them fair use for citeseerx and for us. And if the edit cannot be attributed to the specific user who caused it (and that user convinced or prevented from continuing to make bad edits) or if the process does not involve the user specifically vetting the edits that are made, with a big warning about COPYLINK, then it should not be happening at all. —David Eppstein (talk) 17:23, 7 November 2018 (UTC)
since we do not link to the PDF directly, does that make it okay? honest question about how close to the illegal copy do we need to be in order to be evil. AManWithNoPlan (talk) 18:00, 7 November 2018 (UTC)
I doubt it. We're linking to a site whose only purpose is to provide the link. WP:ELNEVER seems unambiguous: "If there is reason to believe that a website has a copy of a work in violation of its copyright, do not link to it." —David Eppstein (talk) 18:07, 7 November 2018 (UTC)
So, slightly better, but not better enough. AManWithNoPlan (talk) 18:20, 7 November 2018 (UTC)

Bug: doi's with plus signs[edit]

In this edit:

(tJosve05a (c) 09:30, 12 November 2018 (UTC)

it is interesting that Wiley cannot handle the doi either. plus signs are a horrible choice. AManWithNoPlan (talk) 19:06, 12 November 2018 (UTC)
Anyway to get the cite template to enclode the url better so Wiley can resolve it, or is this up to crossref/Wiley to fix? (tJosve05a (c) 22:56, 15 November 2018 (UTC)
waiting for bot to come alive to debug AManWithNoPlan (talk) 03:21, 13 November 2018 (UTC)
https://github.com/ms609/citation-bot/pull/1054 AManWithNoPlan (talk) 16:47, 14 November 2018 (UTC)
Not only conveting existing doi's, but also adding bad doi's :/ (tJosve05a (c) 22:53, 15 November 2018 (UTC)
That is no surprise. AManWithNoPlan (talk) 23:30, 15 November 2018 (UTC)
No, but still sad. A bit surprised though that it didn't add |doi-broken-date=, but I guess it tests if broken before parsing what to write. (tJosve05a (c) 23:47, 15 November 2018 (UTC)
when it gets url encoded, the space becomes a plus sign. When people start using doi with spaces and emojis it is going to suck AManWithNoPlan (talk) 00:02, 16 November 2018 (UTC)
Ugggh! Horrible thoughts! Burn them before they end up in doi's! (tJosve05a (c) 11:38, 16 November 2018 (UTC)

Do not touch any parameter with comments[edit]

In this edit.

  • Removed/touched a parameter with a comment <!-- some readers have trouble with the link generated by the doi= field? -->, which should "block out" the bot from touching it. (tJosve05a (c) 09:30, 12 November 2018 (UTC)

Request: handle non-escaped dx.doi.org URL[edit]

Status
new bug
Reported by
(tJosve05a (c) 10:26, 15 November 2018 (UTC)
What happens
Adler, Robert F.; et al. (December 2003). <1147:TVGPCP>2.0.CO;2 "The Version-2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis (1979–Present)". Journal of Hydrometeorology. 4 (6): 1147–1167. Bibcode:2003JHyMe...4.1147A. CiteSeerX 10.1.1.1018.6263. doi:10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2.
What should happen
The bot shoudl remove the |url=
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&oldid=868937431
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1072 AManWithNoPlan (talk) 16:44, 15 November 2018 (UTC)


Status
new bug
Reported by
(tJosve05a (c) 11:36, 16 November 2018 (UTC)
What should happen
Remove |url=
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=869097807&oldid=869097784
We can't proceed until
Feedback from maintainers



and https://en.wikipedia.org/w/index.php?title=Instar&type=revision&diff=872352613&oldid=872351168

Discussion: non-functional jstor dois[edit]

Any thoughts on which of these is better:

Cartwright, Jane (1999). "Early and Medieval Literature". The Year's Work in Modern Language Studies. 61: 556–60. doi:10.2307/25833172 (inactive 2018-10-20). JSTOR 25833172.
Cartwright, Jane (1999). "Early and Medieval Literature". The Year's Work in Modern Language Studies. 61: 556–60. JSTOR 25833172.

Should that bot remove the non-functional doi when it the same as the jstor link with 10.2307 added in front of it? AManWithNoPlan (talk) 16:30, 18 November 2018 (UTC)

I prefer the second version only, or at least not displaying inactive doi's if other IDs exists. (tJosve05a (c) 21:16, 19 November 2018 (UTC)
Non-functional DOI links of the form 10.2307/<JSTORID> can be removed if they are broken. Working JSTOR dois, or JSTOR dois of a different form should be left alone. I believe JSTOR used to have internal redirects, but no longer do, so that's why we've got a bunch of crap 10.2307/<JSTORID> DOIs laying around. Headbomb {t · c · p · b} 21:49, 19 November 2018 (UTC)
Anecdotally, sometimes the works where the JSTOR ID doesn't correspond to a working DOI actually have another DOI from a publisher. I'm not sure if these DOIs were never issued or what. Nemo 23:04, 20 November 2018 (UTC)
That is correct, some do not actually have the doi issued. Some have one from the publisher and one from jstor (and maybe one from researchgate and and who knows who else. AManWithNoPlan (talk) 01:22, 21 November 2018 (UTC)

Cite arXiv should have capital X[edit]

Status
new bug
Reported by
(tJosve05a (c) 12:52, 21 November 2018 (UTC)
What happens
The bot converts <ref>https://arxiv.org/pdf/quant-ph/0512078.pdf</ref> to {{Cite arxiv}}.
What should happen
It should be {{Cite arXiv}} (capital X)
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Consciousness&diff=869954364&oldid=869954280
We can't proceed until
Feedback from maintainers


Request: clean up sciencedirect URLs[edit]

Status
new bug
Reported by
(tJosve05a (c) 13:08, 21 November 2018 (UTC)
What should happen
Remove ?via%3Dihub from sciencedirect URLs
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=next&oldid=869955914
We can't proceed until
Feedback from maintainers


Search results for ?via%3Dihub . Perhaps remove all "via" URL-parameters for that link. (tJosve05a (c) 13:08, 21 November 2018 (UTC)

Unexpected data found in parse_plain_text_reference. Citation bot cannot parse.[edit]

Status
new bug
Reported by
Lithopsian (talk) 14:18, 21 November 2018 (UTC)
What happens
Message Unexpected data found in parse_plain_text_reference. Citation bot cannot parse. Please report. A&A 619, A49 (2018)
Relevant diffs/links
[2], will that work?
Replication instructions
Running CitationBot against Hyperion proto-supercluster should give the message.
We can't proceed until
Feedback from maintainers


Thank you for the report. This comes from arXiv data. We support about a dozen formats that they use. This helps us decode new ones (or in some cases detect and not decode). AManWithNoPlan (talk) 15:33, 21 November 2018 (UTC)

Request: Process website dates more[edit]

Status
new bug
Reported by
(tJosve05a (c) 09:29, 23 November 2018 (UTC)
What happens
|date=30/11/2011
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=870225631
We can't proceed until
Feedback from maintainers


https://en.wikipedia.org/w/index.php?title=School-to-prison_pipeline&diff=872325415&oldid=872325081

https://en.wikipedia.org/w/index.php?title=Standard_of_living_in_Israel&diff=870417470&oldid=868413825

https://github.com/ms609/citation-bot/pull/1098 AManWithNoPlan (talk) 23:08, 23 November 2018 (UTC)

Bot renames parameters to create duplicate alias of existing "work" parameter[edit]

Status
new bug
Reported by
DferDaisy (talk) 19:26, 23 November 2018 (UTC)
What happens
Bot renames "publisher" parameter to "newspaper". However, "website" parameter is already present. This creates a duplicate parameter error since both "website" and "newspaper" are aliases of "work".
What should happen
Don't convert any parameter to any alias of "work" if any alias of "work" (e.g., journal, newspaper, magazine, periodical, website) is already present.
Relevant diffs/links
Robert Stephens diff and Frank Williams (actor) diff and Spinal disease diff
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1100 AManWithNoPlan (talk) 23:07, 23 November 2018 (UTC)
Not sure how to tell if this is fixed, but if it was, it didn't work: edit at 19:57, 29 November 2018, see citation with title beginning "USA cyclist Tejay van Garderen". DferDaisy (talk) 01:31, 30 November 2018 (UTC)

Reuters[edit]

Status
new bug
Reported by
wumbolo ^^^ 14:45, 24 November 2018 (UTC)
What happens
adds |newspaper=Reuters when |agency=Reuters is already present
What should happen
nothing; Reuters is a news agency
Relevant diffs/links
[3]
We can't proceed until
Feedback from maintainers


When the actuall website is Reuters.com, it whould be the work (such as |newspaper=), but while Reuters is the author of an article on another website (such as theguardian/nytimes) it should be |agency=. In this case |agency=Reuters be removed. Both |agency=Reuters and |newspaper=Reuters should not be present. (tJosve05a (c) 14:59, 24 November 2018 (UTC)
Same proble as with assocaited press AManWithNoPlan (talk) 17:54, 24 November 2018 (UTC)
https://github.com/ms609/citation-bot/pull/1102 AManWithNoPlan (talk) 19:31, 24 November 2018 (UTC)

be less exact with agency[edit]

  • In the same edit, it did not add an extra parameter for the Associated Press of Pakistan and for Agence France-Presse. All these agencies can often be called a couple of different names (e.g. AP, the Associated Press, or Associated Press), so that might be an issue. wumbolo ^^^ 19:44, 24 November 2018 (UTC)
I have added to pull 1102 some code to make it less exact. AManWithNoPlan (talk) 23:16, 24 November 2018 (UTC)

Running bot twice (again)[edit]

Status
new bug
Reported by
(tJosve05a (c) 15:14, 24 November 2018 (UTC)
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Malnutrition&diff=prev&oldid=870400649
We can't proceed until
Feedback from maintainers


https://github.com/ms609/citation-bot/pull/1101 AManWithNoPlan (talk) 17:13, 24 November 2018 (UTC)

Discussion: RFC on cite journal publisher and location[edit]

I have begun an RFC at Help talk:CS1 regarding this bot's activity for cite journal publisher and location. Please provide input. --Izno (talk) 16:01, 27 November 2018 (UTC)

A cupcake for you![edit]

Choco-Nut Bake with Meringue Top cropped.jpg That's really great Jackwilliam2 (talk) 12:20, 3 December 2018 (UTC)

Newspapers with multiple names[edit]

Status
new bug
Reported by
wumbolo ^^^ 22:03, 5 December 2018 (UTC)
What happens
changes |website=[[The Daily Telegraph]] to |newspaper=The Telegraph
What should happen
don't change the newspaper name if it's wikilinked
Relevant diffs/links
[4]
We can't proceed until
Feedback from maintainers


BBC Sport[edit]

Status
new bug
Reported by
Mattythewhite (talk) 16:01, 6 December 2018 (UTC)
What happens
changes |publisher=BBC Sport to |newspaper=BBC Sport
What should happen
Nothing; |publisher=BBC Sport is the preferred format
Relevant diffs/links
[5]
We can't proceed until
Feedback from maintainers


Why do you say that? AManWithNoPlan (talk) 15:33, 7 December 2018 (UTC)

Please refer to this discussion. Mattythewhite (talk) 13:51, 8 December 2018 (UTC)
I will think about the solution since bbc (not bbc sports) is the publisher. Newspaper is one of the many work aliases. AManWithNoPlan (talk) 21:16, 8 December 2018 (UTC)

Fails to convert urls with library proxies[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 13:22, 7 December 2018 (UTC)
What happens
[6]
What should happen
[7]
We can't proceed until
Feedback from maintainers


chapter= added to Cite encyclopedia without removing title=[edit]

Status
new bug
Reported by
Jonesey95 (talk) 18:53, 8 December 2018 (UTC)
What happens
chapter= was added to Cite encyclopedia without removing title=, causing there to be one quoted version of the chapter name and one italicized version.
What should happen
Bot should not operate on a citation formatted in this way
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User%3AJonesey95%2Fsandbox3&diff=prev&oldid=872714555
We can't proceed until
Feedback from maintainers


|chapter= is not a documented parameter in {{cite encyclopedia}}. |title= is supposed to be used for the encyclopedia entry. The bot should probably not add chapter at all when title is present, and it definitely should not add chapter and leave title in place. – Jonesey95 (talk) 18:53, 8 December 2018 (UTC)

The bot's edit summary was also partially incorrect in this edit, in that it claimed to have "Removed parameters", but it did not do so. – Jonesey95 (talk) 18:54, 8 December 2018 (UTC)