User talk:Citation bot

From Wikipedia, the free encyclopedia
  (Redirected from Wikipedia:DBUG)
Jump to navigation Jump to search


Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. Also, see disussion links in case the bot did something that your disagree with to see if it is under discussion.

Please click here to report an error.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the code that needs written.

Discussion Links – Please Provide Input[edit]

Should publisher be removed from journals

Request: Usage methods tracking[edit]

Please track how the bot is activated in edit summaries (toolbar, draft, website, other peoples *.js files). Use whitelist of approved methods, with 'toolbar' being grandfathered. AManWithNoPlan (talk) 19:38, 30 December 2018 (UTC)

Page Ranges vs specific pages in journals[edit]

Hi. This edit changed a citation (Irish University Review) from the page number of the cited content (page 5) to the page range of the journal article (pages 5–21). Please make sure the bot isn't doing the same on other articles. Scolaire (talk) 10:18, 31 January 2019 (UTC)

the bots actions are correct. If you want a specific page, you need to use |at=. AManWithNoPlan (talk) 13:52, 31 January 2019 (UTC)
{{notabug}}, but template documentation could be better—it is generic and not cite journal specific. Also, incorrectly putting in the first page is so incredibly common that expanding to a range is the right thing %99.99 of the time. AManWithNoPlan (talk) 14:50, 31 January 2019 (UTC)
Thanks for fixing it. I agree that it should be made clearer in the documentation. Scolaire (talk) 15:20, 31 January 2019 (UTC)
The template documentation is not at fault here. There have been ongoing discussions over the past many years about journal citations referring to the whole work of the article rather than the specific page. The bot should probably not be acting on these at all. --Izno (talk) 16:28, 31 January 2019 (UTC)
I wonder if the bot finds a page that is within the page range then it should change it to |at= for the sake of being precise. If the page is the first page or out of range or blank the update to range. AManWithNoPlan (talk) 17:49, 31 January 2019 (UTC)
I'm still confused. If the "pages" parameter is meant for the page range of the article (which is different from its use in Cite book, for instance), why is there a "page" parameter as well as an "at" parameter? Scolaire (talk) 18:43, 31 January 2019 (UTC)
You are using the template correctly. The reason that you're still confused is because journals for some reason have a Special Case By Convention perhaps not obvious to everyone. You can search the talk archives of Help talk:CS1 to see that it has been discussed, with no obvious final resolution on the point. --Izno (talk) 19:35, 31 January 2019 (UTC)
I'd be okay with "if it finds the page inside the page range (inclusive), leave alone, otherwise update". --Izno (talk) 19:36, 31 January 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1301 AManWithNoPlan (talk) 21:22, 7 February 2019 (UTC)

Citation bot continues to violate WP:ELNEVER by adding CiteSeerX links of dubious provenance[edit]

Status
new bug
Reported by
David Eppstein (talk) 18:58, 8 February 2019 (UTC)
What happens
Citation bot, apparently in automatic mode rather than editor-initiated, is adding CiteSeerX links to articles. In many cases this is useful but in a significant minority of the cases the links violate WP:ELNEVER. Links should only be added when they derive either from official and open publisher copies or from copies placed online by the original author of the work (directly or indirectly e.g. via an institutional repository that the author has contributed to). Many of the links in CiteSeerX instead derive from course reading lists, researchers' collections of related works, or other material that, in their original form, may meet the legal requirements for fair use but DO NOT meet Wikipedia's stricter requirements for links. Citation bot is unable to evaluate the provenance of its CiteSeerX links so it should never add them without human supervision.
What should happen
CiteSeerX disables the automatic addition of CiteSeerX links. Or it gets blocked the next time I see another bad link of this type.
Relevant diffs/links
Special:Diff/882384271. In this example, two CiteSeerX links were added, one for "Efficient planarity testing" (Hopcroft/Tarjan) and one for "LEDA" (Mehlhorn/Naher). Following the CiteSeerX links shows that the Mehlhorn/Naher link is ok — at least one of its original sources is from a web page under the control of one of the authors, Mehlhorn. The Hopcroft/Tarjan link is not ok — it has two original source links, both of which are personal web pages under the control of David P. Dobkin, who is not an author. Whether those links are online is between Dobkin, the authors, and the publisher, not our concern. But adding this link here is in violation of WP:ELNEVER.
We can't proceed until
Feedback from maintainers


Address the problem at its root source, report the violation to CiteSeerX, if it is, in fact, a violation. Headbomb {t · c · p · b} 19:01, 8 February 2019 (UTC)
No. CiteSeerX has different constraints than we do on what we can link. WP:ELNEVER is very clear. It is not the responsibility of CiteSeerX to prevent you from adding violating links. And it should not be the responsibility of human editors to hand-check each and every one of these thousands of edits the bot is making. If you are not willing to stop the bot from adding these bad links, I will block it. —David Eppstein (talk) 19:04, 8 February 2019 (UTC)
CiteSeerX is a big boy site, and they have their big boy pants on. If there are hosting paper while violating copyright, they're the ones exposing themselves to lawsuits, not Wikipedia. We also do not link to the paper directly, we link to CiteSeerX metadata, which is not a copyright violation. Headbomb {t · c · p · b} 19:10, 8 February 2019 (UTC)
CiteSeerX does not appear to be violating copyright law in any way. https://en.wikipedia.org/wiki/User_talk:Citation_bot/Archive_13#Do_not_automatically_add_Citeseerx AManWithNoPlan (talk) 19:11, 8 February 2019 (UTC)
That earlier discussion was about user-activated instantiations of the bot. When a user does this, they implicitly take responsibility for checking the results and making sure that they are not introducing link policy violations. The discussion here is about fully automatic instantiations, when there is no user but Citation bot itself to blame. —David Eppstein (talk) 19:34, 8 February 2019 (UTC)
Also of note, if you block the bot, very likely you will get in trouble for violating WP:INVOLVED. I know I'd start proceedings against you did you did take an admin action in such a matter. Headbomb {t · c · p · b} 19:13, 8 February 2019 (UTC)
[edit conflict] That may or may not be true but it is irrelevant. What part of "External links to websites that display copyrighted works are acceptable as long as the website is manifestly run, maintained or owned by the copyright owner; the website has licensed the work from the owner; or it uses the work in a way compliant with fair use." is unclear? Note the complete absence of whether the link host is in violation of law from those conditions. None of those conditions appear to be true for the link in question, unless you interpret "fair use" so broadly as to make any link anywhere ok. And how am I involved? I have not participated in bot development and have merely watched the bot make many dubious changes and reacted to them, the same as I would for any other editor making rapid-fire dubious changes. —David Eppstein (talk) 19:15, 8 February 2019 (UTC)
You want to change how the bot operates, and have tried repeatedly to do so, and now are using your admin bit as a blugdeon, rather than demonstrate via consensus there is a problem with the bot's actions, or that linking to CiteSeerX metadata via bot is legally problematic despite the evidence to the contrary. And I'll also point out that there is a very easy way to prevent the bot from repeating mistakes. Headbomb {t · c · p · b} 19:19, 8 February 2019 (UTC)
I have no idea whether it is legally problematic, neither do you, and that is in any case a red herring. The issue is that it is clearly in violation of Wikipedia's external link guidelines. They have different and stricter standards than the law, and it is those standards we must live up to here. —David Eppstein (talk) 19:23, 8 February 2019 (UTC)
Again, we are not linking to a copy of the paper, we are linking to CiteSeerX metadata (e.g. CiteSeerx10.1.1.54.9556). Headbomb {t · c · p · b} 19:24, 8 February 2019 (UTC)
You are continuing to Wikilawyer but a sentence only a couple lines down is again clear: "If there is reason to believe that a website has a copy of a work in violation of its copyright, do not link to it." That does not have any exception for Playboy-style "we're only reading it for the articles, not the porn" excuses. —David Eppstein (talk) 19:31, 8 February 2019 (UTC)
The bot doesn't run in a fully automated way. However, see below. Headbomb {t · c · p · b} 19:42, 8 February 2019 (UTC)
So your position is now "it's someone else's fault but I can't tell you whose"? That's not good enough. We need these bad link additions to stop. —David Eppstein (talk) 07:02, 9 February 2019 (UTC)

Meanwhile I wonder what's the point of removing an identifier which at CiteSeerX doesn't even have a PDF. Nemo 18:59, 10 February 2019 (UTC)

The only point of linking to that identifier is to follow its links to copies of a paper hosted elsewhere which, if pointed to directly, would certainly violate WP:ELNEVER. —David Eppstein (talk) 20:04, 10 February 2019 (UTC)
That's not my understanding. Exposing metadata and a citation graph is the/a primary aim of the linked CiteSeerX pages, per http://csxstatic.ist.psu.edu/home . Nemo 20:52, 10 February 2019 (UTC)

This entire discussion is misguided. The Wikipedia standard does NOT apply to references. https://en.wikipedia.org/w/index.php?title=Wikipedia:COPYLINK is the correct standard AManWithNoPlan (talk) 01:36, 20 February 2019 (UTC)

COPYLINK is also pretty clear... "if you know or reasonably suspect that an external Web site is carrying a work in violation of the creator's copyright, do not link to that copy of the work" ... "Knowingly and intentionally directing others to a site that violates copyright" (and I don't want to see you trotting out the old "they've never been convicted so it's not copyright violation" bullshit). —David Eppstein (talk) 02:38, 20 February 2019 (UTC)
not try to dodge discussion, just wanted to make sure that we were reading the correct documentation: not WP:ELNEVER but WP: COPYLINK AManWithNoPlan (talk) 03:30, 20 February 2019 (UTC)

'User-activated'[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 19:23, 8 February 2019 (UTC)
What happens
[1]
What should happen
The username should be reported.
We can't proceed until
Feedback from maintainers


Link to the Issue on GitHub with links to documentation someone needs to read and use https://github.com/ms609/citation-bot/issues/948 AManWithNoPlan (talk) 19:45, 8 February 2019 (UTC)

Not really sure what I'm looking at there, but activating via the API with the username specified should still be allowed , e.g. https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&slow=1&user=Headbomb&page=FOOBAR

. Headbomb {t · c · p · b} 19:49, 8 February 2019 (UTC)

But activating without one is also allowed. AManWithNoPlan (talk) 19:55, 8 February 2019 (UTC)
I think it's time we revise that. Edits must be attributable to those who activate the bot. Headbomb {t · c · p · b} 20:01, 8 February 2019 (UTC)
If the bot is not running in autonomous mode, that means an editor is choosing to activate it on a specific article. If that editor's name can't be given credit for the edit itself (with Citation Bot named in the edit summary), that editor's name should clearly be recorded in the edit summary. This should be a no-brainer. We don't let editors run AWB as User:AWB with a summary of "AWB general edits"; this is very similar. – Jonesey95 (talk) 21:20, 8 February 2019 (UTC)
A first pass at some code. https://github.com/ms609/citation-bot/pull/1313 probably doesn't work and will need a key from the wiki overlords AManWithNoPlan (talk) 21:31, 8 February 2019 (UTC)
This should be a no-brainer. True, but writing the code and getting it to work is not a no-brainer. We could really use some help on that. AManWithNoPlan (talk) 21:45, 8 February 2019 (UTC)
I agree that this should be a high priority. I checked again all the ELNEVER violations that the bot is continuing to add (beyond the one in my report above, see Special:Diff/882439506 (Brent), Special:Diff/882429046 (Szekely), and Special:Diff/882394979 (Fiat and Shamir), and all are marked as "User-activated". So I would like to be able to track whoever is doing this down in order to get them to stop. However, stopping this from happening takes priority over making sure that the most blameworthy party takes the blame for it, and without being able to identify a responsible user the blame is currently resting on Citation bot for introducing these bad links, and the obvious way to prevent it from happening is to block the bot. —David Eppstein (talk) 06:59, 9 February 2019 (UTC)
An edit filter for 'user-activated' would be much preferable to a block. Headbomb {t · c · p · b} 21:44, 9 February 2019 (UTC)
Aren't you the same guy that thought letting anonymous users run it on all pages that a page linked to was a good idea? AManWithNoPlan (talk) 23:22, 9 February 2019 (UTC)
No? I emphatically agreed with the need to restrict that feature. Headbomb {t · c · p · b} 23:48, 9 February 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1319 work in progress AManWithNoPlan (talk) 23:45, 9 February 2019 (UTC)
Ok, as long as you're working on this I'll hold off on pushing for a block. I appear to have been mistaken in thinking this was the bot running in automatic mode; there were five more ELNEVER-violating links added on my watchlist this morning, but all of them continue to be marked as user-activated. This needs to stop, but if we can track down the user responsible for these problems then it won't be necessary to block the bot from doing its other useful work. —David Eppstein (talk) 00:21, 10 February 2019 (UTC)
But the longer this goes on the more my patience is getting stretched thin. I am having to check dozens of edits per day and finding many many violations of ELNEVER. (Another eight violations just from this morning, just from my watchlist — I don't have the patience to check the bot's entire contribution list.) Please, as a show of good faith, disable the CiteSeerX feature until these bad edits can be attributed to a human editor. —David Eppstein (talk) 19:01, 10 February 2019 (UTC)
The edit filter is that way WP:EDITFILTER. Headbomb {t · c · p · b} 19:24, 10 February 2019 (UTC)
You suggest I should request that CiteSeerX edits be blocked by edit filter rather than allowing them to continue when a human editor takes responsibility for them and takes the block if repeatedly failing to do so? That seems drastic. In the meantime, I am taking your response as a WONTFIX, and rescinding my statement above that I am holding off pushing for a block. These unattributed ELNEVER violations must stop. —David Eppstein (talk) 19:37, 10 February 2019 (UTC)
No, I suggest you make an edit filter to filter out the 'user-activated' edits of the bot. I'll also point out I'm neither maintainer nor operator of Citation bot. Headbomb {t · c · p · b} 19:39, 10 February 2019 (UTC)
You are suggesting that if I close my eyes and stop seeing it making link violations in my watchlist, the problem will go away? As long as "user activated" cannot be attributed to an actual user, it is entirely the responsibility of Citation bot to stop making bad edits. If it can't stop and can't pass the blame to a specific user, it needs to be blocked to prevent ongoing harm to the encyclopedia. —David Eppstein (talk) 19:43, 10 February 2019 (UTC)
I suggest that you use your brain and implement a solution that prevents the problematic unattributed edits of citation without throwing the baby with the bathwater. The edit filter achieves that. Blocking the bot is a net negative, especially when you've got an alternative. Headbomb {t · c · p · b} 19:46, 10 February 2019 (UTC)
So block all edits from Citation Bot that add a CiteSeerX link, rather than blocking the bot outright? I don't have the permissions to do that directly but I suppose it's worth a try requesting it. —David Eppstein (talk) 19:50, 10 Fe bruary 2019 (UTC)
The combination of 'user-activated' + 'citeseerx' would be much better. Or 'user-activated' alone. Headbomb {t · c · p · b} 20:08, 10 February 2019 (UTC)

───────────────────────── Ok, see Wikipedia:Edit filter/Requested#CiteSeerX and Citation bot. —David Eppstein (talk) 20:45, 10 February 2019 (UTC)

Should that documentation for the citation tempates be updated to say that this should only be added if the citation is lacking other identifiers. There are plenty of documents that only exist there, since they are unpublished. AManWithNoPlan (talk) 16:11, 15 February 2019 (UTC)
link to oauth code work. https://github.com/ms609/citation-bot/pull/1335 AManWithNoPlan (talk) 21:00, 15 February 2019 (UTC)
This is still not deployed. Or if it's deployed, the bot is still not attributing the edits to the activator. Headbomb {t · c · p · b} 21:33, 20 February 2019 (UTC)
still playing with, but I cannot debug it myself a without some Smith help. AManWithNoPlan (talk) 21:36, 20 February 2019 (UTC)

Capitalization: German Mit[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 07:45, 16 February 2019 (UTC)
What should happen
[2]
We can't proceed until
Feedback from maintainers


Request: If there's no URL, remove via[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 08:36, 16 February 2019 (UTC)
What happens
[3]
What should happen
[4]
We can't proceed until
Feedback from maintainers


WP:SAYWHERE is explicitly not WP:SAYHOW. While |via= may let readers know where a link points to when it's unusual, it's pointless to have when you have no link to go with. Headbomb {t · c · p · b} 08:36, 16 February 2019 (UTC)

This is a bad change. I don't need a URL to be reproduced by a secondary organization. --Izno (talk) 13:34, 16 February 2019 (UTC)
Agreed about it being a bad change. |via= is often used in lieu of a URL precisely because EBSCOhost and other repositories don’t have permanent URLs, but it’s still useful to let people know where they got the article. Umimmak (talk) 14:09, 16 February 2019 (UTC)
And that's exactly what WP:SAYWHERE says not to do. That you read a journal article through an EBSCO database versus PASCAL, ProjectMuse, or GoogleScholar is irrelevant. This is also undue promotion/publicity of paid databases.Headbomb {t · c · p · b} 16:09, 16 February 2019 (UTC)
But we still have parameters like |jstor= because it can be helpful to let the reader know that an online version of the article exists on JSTOR. I don’t see how this use of |via= is different; it lets the reader know the article can be found in a particular database which they might have access to. If you don’t want to use |via= at all, that’s one move you could try to gain consensus for but I don’t see the benefit of removing it only when there isn’t a URL. (Surely the URL tells you where you’re going, making |via= redundant, no?) Umimmak (talk) 21:01, 16 February 2019 (UTC)
JSTOR is very different: it's an identifier, which was also assigned to publications never digitised before and lacking a DOI. (Although some JSTOR IDs have also become DOIs, while other publications with a JSTOR ID have been later been assigned a DOI by a publisher.) Nemo 21:19, 16 February 2019 (UTC)
|jstor= gives you a link to the specific paper on the JSTOR repository. We don't just add |via=JSTOR with no link to JSTOR. There is no reason for the reader to care that you've personally accessed the article via an EBSCOhost vs PASCAL vs ProQuest vs Whatever database, the only thing that matters is what article you read. How you've accessed the material is irrelevant.Headbomb {t · c · p · b} 21:20, 16 February 2019 (UTC)
Yes and if EBSCO had convenient URLs or identifiers those would be used instead of |via=; I still think it’s helpful to say a resource is available online, particularly when it’s a resource Wikipedia editors have access to via WP:LIB. Umimmak (talk) 21:30, 16 February 2019 (UTC)
Which makes it extra pointless to readers (and even harmful since you're directing them manual search for sources they cannot access), completely unneeded for WP:V, promotes a specific commercial service, and against WP:SAYWHERE. Headbomb {t · c · p · b} 01:28, 17 February 2019 (UTC)
Amazon links and Google links to books are supposed to be removed unless they link to a free preview of the referenced information. This is because we are not supposed to promote individual content providers. This is relevant to consider. AManWithNoPlan (talk) 15:11, 16 February 2019 (UTC)
Basically, if via was valid without a url, then |via=Google it you dumbass would basically be the right answer on 99% of {{cite web}}. AManWithNoPlan (talk) 15:25, 16 February 2019 (UTC)
No? It has legitimate use even with a URL. See the documentation for the parameter. This bot should not touch via whatsoever. --Izno (talk) 15:27, 16 February 2019 (UTC)

When we lose urls we obviously get rid of it, and in most cases it is probably best to get rid of when no url, but this seems like something that should only be removed by an automatic process if there are obvious other links such as doi/pmc. Having a bot just remove them in general seems dubious. Although I do laugh when vis says Google search and remove it. AManWithNoPlan (talk) 15:58, 16 February 2019 (UTC)

Also, it is amazing how often I find ‘my local library’ as the |via= AManWithNoPlan (talk) 17:19, 16 February 2019 (UTC)
Unless |via= provided some uquique source of information, it is simply promoting a specific database. Unrelated side note: EBSCO so-called urls do suck. ProQuest at least gives every document a single number (actually, they sometimes give it more than one) that you can simply use. AManWithNoPlan (talk) 22:38, 16 February 2019 (UTC)
───────────────────────── Headbomb conveniently forgets that the RfC on |via= also closed with consensus against their position (getting to be a bit of a pattern that). However, they are correct that the parameter should not be used to indicate that a paper might be available from a particular database: it's to indicate that database through which it was actually accessed, and some judgement and common sense is required regarding its use (do not blindly add it). There is no good reason to include |via= when the specific source accessed cannot possibly have any differences from a conceptual perfect master: a prime example is a paper journal accessed in a physical copy in your local library (and the same, roughly, goes for an electronic copy on the publishers own website: both are effectively to be considered perfect copy of record, and only third-party republishing/access should be indicated in |via=). However, this does in no way require a link or identifier: you may have accessed the article through a random website or database, identiified in |via=, but omitted to include the link or identifier. That makes the flaw in the citation the absence of links or identifiers, not the precense of |via= (in fact, in that case |via= may be essential to enable locating the source in question, or determining equivalency between copies of indeterminate provenance). The matter of when |via= serves a purpose and when it is just pointless clutter is not a clear cut one, which is why it should not be treated mechanistically: it is not something that can be determined by a bot based on a simplistic rule like presence of |url=. However, there are probably a few (a very few) "blacklist" type cases that could constructively be detected, along the lines of |via=my local library or |via=web search. Things like |via=Google do not qualify: it may be trying to identify a Google Books preview or similar that needs human judgement to determine whether it makes sense or no (that it will likely mostly not make sense does not change that fact). And "human judgement" is not the few self-selected people here: it requires looking individually at each specific instance, and is subject to local consensus processes at each article. You can't make a consensus here that decides what happens over there. Case in point, Headbomb's favourite bugaboo, |via=EBSCOhost, can be argued both ways (include vs. leave out) and thus needs to employ the same consensus processes as all other such issues (SAYWHERE describes what you are not required to do, not what you are prohibited from doing). Removing it with a bot is simply an attempt to circumvent those processes. --Xover (talk) 06:53, 17 February 2019 (UTC)
There was an RFC on via? When? As for your example "There is no good reason to include |via= when the specific source accessed cannot possibly have any differences from a conceptual perfect master". That's exactly what those databases offer. Perfect reproductions of published version of records, with no material differences, save perhaps for a preamble page unique to the database. Headbomb {t · c · p · b} 07:41, 17 February 2019 (UTC)
Given you were the one that started the RfC, your question here now is pretty disingenious. And your further argument is a salient one to make when discussing one specific use in one specific citation. I might even conceivably agree with you in such a discussion (or not; it would depend on the situation). --Xover (talk) 09:17, 17 February 2019 (UTC)
Again, what RFC? Headbomb {t · c · p · b} 09:45, 17 February 2019 (UTC)
I do remember Headbomb getting some KFC, but I no memory of an RFC. AManWithNoPlan (talk) 20:31, 17 February 2019 (UTC)

Maybe it is this https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(policy)/Archive_146#Should_WP:TWL_be_allowed_to_acknowledge_the_services_they_have_partnership_with_in_our_articles? Where via was generally considered worthless and often harmful, but did in some situations have value (in all the discussions it generally assumed that a url was present. AManWithNoPlan (talk) 21:23, 17 February 2019 (UTC)

New bug from R8R[edit]

Status
new bug
Reported by
R8R (talk) 18:27, 20 February 2019 (UTC)
What happens
First of all, many |url= instances are changed into |chapter-url=, even though many references refer not just to a chapter, but to an exact page within that chapter. Second, capitalization of the names of the French journals was entirely unnecessary; the French don't do that and both references had |language=fr. Third, |pages=IE-87 did not need to replace a hyphen with an en dash. Such a change would make sense in many cases when people don't know how to or simply don't care about the proper page ranges, but letters should signal this isn't a regular case.
Relevant diffs/links
[5]
We can't proceed until
Feedback from maintainers


“capitalization of the names of the French journals” Wikipedia style guides follow the English formatting rules. AManWithNoPlan (talk) 18:35, 20 February 2019 (UTC)
Changing pages makes the template text matched the displayed text. If that is wrong, then fix the underlying text: please see Bot main page for description. AManWithNoPlan (talk) 18:35, 20 February 2019 (UTC)
Not sure what you mean about url changes since chapter is closer to pages than the book. AManWithNoPlan (talk) 18:35, 20 February 2019 (UTC)
Thank you for your second response, I have read it and added {{hyphen}} in the citation instead. I decided to leave the issue covered by your third response be, maybe I'm making a problem out of nothing. As for the first, I don't quite follow. The French don't capitalize names of their journals; therefore, their proper names are not capitalized. If capitalization of such proper names changes depending on the surrounding language, could you provide a link to a rule that says that? I haven't found anything of this sort in en.wiki citation or CS1 rules.--R8R (talk) 18:51, 20 February 2019 (UTC)
WP:CAPITALS? --Izno (talk) 18:54, 20 February 2019 (UTC)
Ah, I see it at MOS:FOREIGNTITLE: Capitalization in foreign-language titles varies, even over time within the same language. Retain the style of the original for modern works. For historical works, follow the dominant usage in modern, English-language, reliable sources.
So the change the bot made is more-or-less incorrect. --Izno (talk) 19:02, 20 February 2019 (UTC)
The url change seems reasonable as there is no |page-url=. As for capitalization, we don't use French rules. |language= does not refer to the language title of the work(s) or where the work was produced but the content within, so that reason for not changing is straight bogus in the context of the template. As for hyphens and endashes, that's a hard problem. I'm not sure what the best behavior is for that. --Izno (talk) 18:52, 20 February 2019 (UTC)
hypens and dashes are annoying. We just change the data to match display. Is 7-8 pages 7 to 8 or page 8 of section 7? Or 3-7–3-9 is really ugly. AManWithNoPlan (talk) 18:58, 20 February 2019 (UTC)
Re the capitalization I think MOS:FOREIGNTITLE is the most relevant guideline. It says to respect the French capitalization. I think this is also in agreement with WP:COMMONNAME (a policy!). For instance, when we have articles on journals or magazines with French-language titles, they should be capitalized in the French way; e.g. Revue politique et littéraire. —David Eppstein (talk) 19:02, 20 February 2019 (UTC)
I get this, and I wouldn't have complained if I had "7-8". I, however, had "IE-87": since what stands before the hyphen is not a number, but a string of letters, there is no unambiguity here. Based on what I know from my shallow coding skills, this shouldn't be too hard to check? I get it that few people might have encountered this so far, but the fix presumably shouldn't be too hard to make, either. If you indeed decide to alter the bot to change the foreign title capitalization (thank you Izno and David for finding the appropriate guideline), wouldn't it be a good idea to change this, too?--R8R (talk) 19:17, 20 February 2019 (UTC)
(EC) With respect to the French, we do capitalize journal titles (Annales de la Société Entomologique de France). We also don't. Title casing or sentence casing is a matter of preference. Likewise how to capitalize foreign title in English is also a matter of preference. Some style guides use title casing, some use sentence/original casing. The bot isn't necessarily wrong to capitalize them, but sadly WP:CITEVAR is a thing. Best way to deal with this at the moment is to add a comment in the journal name, but exceptions could also be added at the bot-level for the more common journals. Headbomb {t · c · p · b} 19:18, 20 February 2019 (UTC)
Surely the best way to handle issues that reasonably fall under CITEVAR is for the bot not to violate CITEVAR by gratuitously changing everything to its own preferred style? —David Eppstein (talk) 21:13, 20 February 2019 (UTC)
as for page numbers we are converting the meta data to match what users see. See the template documentation. AManWithNoPlan (talk) 21:17, 20 February 2019 (UTC)
99%+ of journals cited are in English, and the bot brings it in line with MOS. Again, if you don't want the bot to change something from one style to the other, either a) don't use the bot, b) be prepared to tell it to not touch something specific, or c) report the problematic journal here so it can be added to the capitalization exceptions (La Revue scientifique, not being valid in either title/sentence casing variation). The page number thing above is a bug though.Headbomb {t · c · p · b} 21:18, 20 February 2019 (UTC)
If the Bot's changes are correct and conform to Wikipedia's policies and guidelines in 99% of cases, that still means that there are roughly 60,000 articles that it is likely to fuck up. Bots can do a lot more damage than humans so they should be much more circumspect. Human editors should not be expected to have to tag every article they edit with bot exclusions to prevent damage by Bots with badly-estimated views of the level of edits they are competent to make. —David Eppstein (talk) 01:13, 21 February 2019 (UTC)
Again, exceptions which can be bypassed at the individual levels, or at the bot level. Or by simply not activating the bot.Headbomb {t · c · p · b} 01:18, 21 February 2019 (UTC)

I stand corrected, the citation templates now detect pages of IE-7 and don’t convert the dash. AManWithNoPlan (talk) 21:35, 20 February 2019 (UTC)

https://github.com/ms609/citation-bot/pull/1354 AManWithNoPlan (talk) 02:39, 21 February 2019 (UTC)

Pointless removal of urls, incorrect edit summary[edit]

Status
new bug
Reported by
♦ J. Johnson (JJ) (talk) 22:14, 20 February 2019 (UTC)
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


As seen in this edit: why are the urls being removed?

Also: The edit summary says: "Alter: title. Removed accessdate with no specified URL. Removed parameters." But the first change removed accessdate AND the url simultaneously, other changes removed urls but not any accessdates. Ergo, the edit summary is incorrect. Is that due to operator inattention? Or is that a bot problem? ♦ J. Johnson (JJ) (talk) 22:14, 20 February 2019 (UTC)

It's accurate in the sense that it first removes redundant URLs ('removed parameters'), then since there was no URL left in that citation, it also removed the accessdate. The edit summary could be clearer though. "Removed redundant URLs" would be much better than "Remove parameters". Headbomb {t · c · p · b} 22:30, 20 February 2019 (UTC)
That's nonsense. If after removing a URL "there was [then] no URL left in that citation", the URL was not redundant. That's sheer ludicrousity. ♦ J. Johnson (JJ) (talk) 00:49, 21 February 2019 (UTC)
If I link to you the same thing twice in the same citation, then yes, that's exactly what the word redundant means. Headbomb {t · c · p · b} 01:17, 21 February 2019 (UTC)
Edit summaries are not intended to tell everything. Historically the length have been limited too. AManWithNoPlan (talk) 02:28, 21 February 2019 (UTC)

JSTOR improvements[edit]

Status
new bug
Reported by
(tJosve05a (c) 02:16, 21 February 2019 (UTC)
What should happen

Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User%3AJosve05a%2Fcite-sandbox&diff=prev&oldid=884350777
We can't proceed until
Feedback from maintainers


Ah yes. We currently limit them to 100000 and up to avoid GIGO AManWithNoPlan (talk) 02:27, 21 February 2019 (UTC)