User talk:Citation bot

From Wikipedia, the free encyclopedia
  (Redirected from Wikipedia:DBUG)
Jump to navigation Jump to search


Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Request: Capitalize linked journals[edit]

Status
Feature Request
Reported by
Headbomb {t · c · p · b} 17:09, 30 April 2019 (UTC)
What should happen
[1]
We can't proceed until
Feedback from maintainers


That is very dangerous territory. We would have to verify that the old page did not exist at all and that the new page did exist. We really have not ever got in the business of fixing red links. AManWithNoPlan (talk) 15:01, 2 May 2019 (UTC)

It's not a matter of fixing redlinks, it's a matter of capitalization. E.g. Journal of physics vs Journal of Physics or INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY vs International Journal of Systematic and Evolutionary Microbiology. Or Developmental neuroscience vs Developmental Neuroscience. Headbomb {t · c · p · b} 15:43, 2 May 2019 (UTC)
And in the rare case that the capitalized version links to a different page, it will link to the correct page instead of the wrong one. Headbomb {t · c · p · b} 15:48, 2 May 2019 (UTC)

Strip semicolons[edit]

Status
feature request
Reported by
Headbomb {t · c · p · b} 00:13, 18 June 2019 (UTC)
What should happen
[2], [3]
We can't proceed until
Feedback from maintainers


This should perhaps not apply to |title= however. Also might not be safe to do in some identifiers. Headbomb {t · c · p · b} 00:14, 18 June 2019 (UTC)

and as always titles good friend |chapter= too. AManWithNoPlan (talk) 13:57, 23 June 2019 (UTC)
And contribution and other aliases. Headbomb {t · c · p · b} 15:49, 23 June 2019 (UTC)
And NOT & a m p ; and his friends. AManWithNoPlan (talk) 01:25, 30 June 2019 (UTC)
Rather than a blacklist, we would want a white list of parameters. AManWithNoPlan (talk) 15:17, 6 July 2019 (UTC)
Get the list of all parameters and remove those then. Headbomb {t · c · p · b} 20:07, 6 July 2019 (UTC)

Idea: Usage stats[edit]

Not really anything pressing, but now that we have OAuth in, it would be neat to have usage statistics. Who makes use of the bot. If the bot is activated via the web interface, scripts, etc... Or whatever else is trackable. Headbomb {t · c · p · b} 06:15, 27 June 2019 (UTC)

I guess one could sort the bot contributions based on if the edit summary said “category” and one could query Wikipedia and search for edit summaries with the “use this tool” text in them. AManWithNoPlan (talk) 13:40, 27 June 2019 (UTC)
Having a &via=... in the API would likely be a better way of tracking things, but right now I'm mostly thinking about something very non-critical. I'll take any bug fix and things that actual affect the edits of the bots over usage stats thought. Just figured if one of the talk page stalkers felt like compiling stats, or build a sub-module that would export information into an external database after every edit, well that's a nice little project. Headbomb {t · c · p · b} 17:17, 27 June 2019 (UTC)
We currently have no logging, so any logging would have to be done in the edit summaries. AManWithNoPlan (talk) 14:32, 28 June 2019 (UTC)
"Currently", yup. But if there was logging, we could have graphs/stats like [4], except for citation bot usage, instead of pageviews.
Anyway, it's an idea more than anything. Not critical by far, and I'd rather have someone else work on that if that ever gets done (unless we suddendly run out of edit-related bug fixes and feature requests). Headbomb {t · c · p · b} 15:15, 28 June 2019 (UTC)
Could it be enough to add a hashtag and rely on toolforge:hashtags/? Nemo 15:37, 28 June 2019 (UTC)

Question about handles[edit]

Extended content

I'm building a list of various handle links, e.g.

What do you need to know to implement hdl convertion? Do you need to know all root paths

domains? Or just

or even just

? Headbomb {t · c · p · b} 07:24, 27 June 2019 (UTC)

Also does knowing http vs https matter? Headbomb {t · c · p · b} 07:25, 27 June 2019 (UTC)

http and https is irrelevant. Right now, each and every URL path is specific. I should change it to be hosts and paths separate. Hosts is probably enough, unless you find a new file path beyond the usual suspects. Please verify each host actually works though; http://oasis.postech.ac.kr/handle/2014.oak/9965 is not a handle 🙄. AManWithNoPlan (talk) 13:45, 27 June 2019 (UTC)

@AManWithNoPlan:, well, I'm building a massive list with the help of others (e.g. [5]), so I want to know what's the most useful format. Right now, if I have something like

I'll eliminate things that only differ after the /handle/ part, and have something like

and currently have 2169 such paths. Which I could reduce to (after checking that they indeed work inside a {{hdl}})

But I was wondering if there was a way to trim that down further to something more manageable/less redundant. Headbomb {t · c · p · b} 17:05, 27 June 2019 (UTC)

While is is true that some of them probably do not have all these possibilities, I doudt that we would run into a case where http://digilib.gmu.edu/dspace/handle/ works, but http://digilib.gmu.edu/bitstream/handle/ is not a handle but some thing else. So, what I need are three lists:

  1. Protocol: http and https (short list)
  2. Host names (HUGE list)
  3. Suffix list (/handle/, /bitstream/handle/, ....) (medium sized list).

The code can then accept and convert any combination. AManWithNoPlan (talk) 17:22, 27 June 2019 (UTC)

That works. Headbomb {t · c · p · b} 17:29, 27 June 2019 (UTC)

The easy stuff

Protocols: https*
Suffix:\/(dspace|dspace-law|jspui|repository|xmlui)?(\/?bitstream\/)?handle\/

Going to build the host names list. It's in the ballpark of 1228 domains. Headbomb {t · c · p · b} 17:55, 27 June 2019 (UTC)

currently we use a single Regex. I will need to change that. I already have a plan for some simple fast code. AManWithNoPlan (talk) 18:58, 27 June 2019 (UTC)
Code written, now for testing. AManWithNoPlan (talk) 21:19, 27 June 2019 (UTC)
https://github.com/ms609/citation-bot/pull/1856 AManWithNoPlan (talk) 21:20, 27 June 2019 (UTC)
More https://github.com/ms609/citation-bot/pull/1857 AManWithNoPlan (talk) 23:37, 27 June 2019 (UTC)
when you have a host list post the link. AManWithNoPlan (talk) 03:49, 28 June 2019 (UTC)
A preview is in User:Headbomb/Sandbox. User:Betacommand will run a script to see which handle links resolve when put into a {{hdl}}. I'll then be able to give you a list of domains that could be converted. It likely won't cover everything, but it'll probably cover 95%+ of cases. Headbomb {t · c · p · b} 03:56, 28 June 2019 (UTC)
Headbomb Got a final list yet? AManWithNoPlan (talk) 15:31, 3 July 2019 (UTC)

───────────────────────── Still chugging at it. The list of HDL urls that didn't work needs manual review still, because some of the servers were only temporarily down and was not in the most convenient of formats. Should have it by the end of the week though. Headbomb {t · c · p · b} 15:47, 3 July 2019 (UTC)

Headbomb Got a final list yet? AManWithNoPlan (talk) 14:27, 19 July 2019 (UTC)
Still working on it. Not forgotten though. I was travelling for a while, then had computer issues (dead PSU) which prevented me from. Hoping to have it done this weekend. Headbomb {t · c · p · b} 16:08, 19 July 2019 (UTC)
Headbomb any progress AManWithNoPlan (talk) 17:28, 16 August 2019 (UTC)

"Removed URL that duplicated unique identifier"[edit]

[6] I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the {{cite journal}} documentation examples keep the url parameter even when a doi is provided.

Where is the consensus to make this edit en masse? czar 13:26, 20 July 2019 (UTC)

I'm going out now but I'll leave a quick answer to one of your points: a lot of people do, in fact, know to click the DOI. We know for sure from CrossRef data: https://www.crossref.org/blog/https-and-wikipedia/ https://www.crossref.org/blog/real-time-stream-of-dois-being-cited-in-wikipedia/ Nemo 13:36, 20 July 2019 (UTC)
unless the url is free to download without logging in, you should not add them unless there is no other links out. AManWithNoPlan (talk) 13:39, 20 July 2019 (UTC)
there is even movement afoot to remove the automatic linking of titles when a PMC is present. AManWithNoPlan (talk) 13:47, 20 July 2019 (UTC)
My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar 15:17, 20 July 2019 (UTC)
I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)
The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). |url= is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use |doi-access=free to mark it as free, etc. Headbomb {t · c · p · b} 17:29, 20 July 2019 (UTC)

Please see the usage page for why {{notabug}} AManWithNoPlan (talk) 15:50, 21 July 2019 (UTC)

@AManWithNoPlan, sorry, where on the usage page is the consensus/discussion to remove url parameters when a doi is provided? czar 23:21, 21 July 2019 (UTC)
I thought someone added it. Weird. AManWithNoPlan (talk) 23:51, 21 July 2019 (UTC)
It's long standing practice to do this, for the reasons outlined above. Many bots have been approved for this sort of cleanup too, e.g. User:CitationCleanerBot. If you want the title always linked, go to Help talk:CS1 and request that |url= is automatically set to https://doi.org/10.1234/1234567890 whenever a DOI is present. Likewise for other identifiers of record. Headbomb {t · c · p · b} 00:10, 22 July 2019 (UTC)
So is the answer to my question that there is no documented discussion of consensus? czar 12:44, 27 July 2019 (UTC)
the answer is that people are to busy too dig it up. AManWithNoPlan (talk) 14:35, 27 July 2019 (UTC)
Another answer might just be that there has been no 'formal' discussion because formal discussion is not a requirement for something that, it would appear, has silent consensus. I would guess that thousands of edits of this type have been made by the bot and by individual editors (I am one). As far as I know, there has been little to no discussion about removing urls that duplicate the named identifiers. I've done it a lot and have seen quite a few where the url had rotted on the vine while the named-identifier link worked properly.
Trappist the monk (talk) 15:04, 27 July 2019 (UTC)
And bots like User:CitationCleanerBot which has explicit approval for such things. Headbomb {t · c · p · b} 20:18, 27 July 2019 (UTC)
Per Trappist, I would ask Czar to find any past discussion (with a few users who argued) against this established practice. I'm sure you can find some and it would help focus the discussion, because there are various ways to look at it.
I've searched a bit at the village pump and I couldn't find any, although I did find a rather surreal discussion of 2010 on the relationship between DOI and promotion to publishers (you can presumably find many variants of that argument in discussions on Wikipedia:Credo and other similar schemes) plus a few discussions with relevant comments in passing such as "urls to dois should generally not be placed in |url= when there is |doi= because that constitutes overlinking and because most most dois are behind paywalls"
In general, in my opinion policies and guidelines contain two signs that the removal of URL redundant with DOI is desirable.
  1. The very fact that there is consensus on adding a parameter for a certain identifier in {{cite journal}} or others proves that there is a desire to have that identifier presented in a structured way (see Citation templates now support more identifiers, 2011). It follows logically that there is a desire for the identifier information/link to be moved to the structured parameter rather than left lingering in N other ways it can be inserted (the |id= and |url= parameter, free text after the citation template, other templates after the citation templates etc. etc.). Nobody ever complained of people removing links to PubMed or CiteSeerX to use the corresponding identifier parameters instead, after they were introduced: it was the logical expectation. The same for the DOI, especially when doi-access was introduced to give more granular information about it and its target.
  2. At Help:Citation Style 1#Online sources elsewhere you can see that the URL parameter is generally expected to point to a full text of the cited document, open for everyone to see. So strong is the assumption, that in a few places you find a note that yes, a paywalled URL is acceptable if necessary for verifiability: it's clearly considered an exception, because nowhere you will find a general statement that paywalled and commercial copies are preferred over the others. (Such notes were added relatively late in the life cycle of the citation guidelines, around 2009; see also 2010, 2011, 2014 discussions.) The official publisher URL (to which the DOI leads when resolved with doi.org) is generally paywalled so it would by default not be the ideal content of an URL parameter even if the DOI parameter didn't exist.
Nemo 09:09, 28 July 2019 (UTC)
As I was pointed here after I had reverted a "Dup URL" edit, I think if there is consensus, then another change should be made to the templates particularly cite journal that both doi= and url= should not be present, that doi= takes precedence and should automagically populate the URL field with the correct DOI URL, and that this can be flagged in red text in the reflist as other errors. you can still have the bot go around cleaning it up, too, but this helps users to clean it faster (those red errors are easy to spot). I do note that even for paywalled URLs, you still get that the cited journal article exists, its abstract, and sufficient citation deals to meet WP:V, but ideally the DOI URL should get you there too. --Masem (t) 17:26, 2 August 2019 (UTC)
  • Found this thread from 2015:

    I have also seen it argued that readers are more likely to click on a linked article title (which |url= provides) than on an obscure series of letters and numbers and symbols following a cryptic initialism. I haven't done A/B usability testing with readers to find out if this is true, but it seems reasonable to me. Jonesey95 (talk) 05:49, 21 March 2015 (UTC)
    — Help talk:Citation Style 1/Archive 7 § Additional link to doi, bibcode, arxiv, etc. via the url parameter

    But yes, not an easy discussion topic to query, hence why I thought I'd have better luck with meatspace. My concern is essentially the same one I'm quoting (and as Masem alludes). I have no strong opinion on removing |url= when a total duplicate for the |doi= but from my experience watching people use Wikipedia, when the citation's title is unlinked, readers with no knowledge of DOIs aren't going to click through the links unless they're interested in figuring out what a DOI is (same for ISSNs, ISBNs, and similar identifiers). Maybe that makes this more of a CS1 discussion now? There is also a separate discussion to be had re: the edit I first cited above, which removed a |url= that linked to the full text but no |doi-access=free was replaced in its stead. czar 21:47, 3 August 2019 (UTC)
    • Currently |doi-access=free does not turn the title into a link, so I'm personally not especially motivated to add it. Nemo 19:02, 4 August 2019 (UTC)
    • Another discussion was Wikipedia:Bots/Requests for approval/DOI bot 2#Adding URLs to nonfree articles? "the usual style in articles I edit is that url= is reserved for articles where the entire text is freely readable, and that url= is not used for articles where just the abstract is readable (for that, you can just live with the DOI or PMID or whatever)". Nemo 13:12, 6 August 2019 (UTC)
If that is the common sentiment, shouldn't it be added to the CS1 documentation? It's hard to have a discussion for/against the practice because the current standard isn't documented in a central location. czar 20:40, 11 August 2019 (UTC)

If you remove firstn/lastn, also remove author-linkn/authorn-link[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 12:45, 24 July 2019 (UTC)
What happens
[7]
What should happen
[8]
We can't proceed until
Feedback from maintainers


That's more annoying than it sounds since we have to check a lot of name parameters. AManWithNoPlan (talk) 18:25, 9 August 2019 (UTC)

Surgeon General of the United States[edit]

Status
Reported by
QuackGuru (talk) 23:03, 28 July 2019 (UTC)
What happens
Added "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health". The url goes to the CDC website but it is a copy of the Surgeon General of the United States report.
What should happen
The bot should remove "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health"..[9]
We can't proceed until
Feedback from maintainers


The bot should undo this edit. QuackGuru (talk) 18:56, 5 August 2019 (UTC)

According to the website, the preferred citation includes that information: U.S. Department of Health and Human Services. The Health Consequences of Smoking: 50 Years of Progress. A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health, 2014. AManWithNoPlan (talk) 18:18, 11 August 2019 (UTC)
The bot is listing "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" as an "author". QuackGuru (talk) 18:34, 11 August 2019 (UTC)
if you do not like who the publisher lists as the author, then block the bot with author1= <!-- no real author --> AManWithNoPlan (talk) 17:34, 12 August 2019 (UTC)
I thought "author" parameters were reserved for a person's name. Is there a publisher1 and publisher2 and so on for co-publishers or could this be created for co-publishers when other authors are not a person's name? QuackGuru (talk) 18:29, 12 August 2019 (UTC)
Well, that's an interesting question. There is a loose hierarchy publisher > editor > author, similar to series > title > chapter in books. But that still pretty vague, and non-humans can be authors. AManWithNoPlan (talk) 23:50, 12 August 2019 (UTC)
See Template:Cite book. I could not find where it mentions "author" for non-humans. There is no solution for when there are multiple co-publishers. Just let it be or someone could propose creating new parameters for co-publishers. QuackGuru (talk) 00:35, 13 August 2019 (UTC)
The use of |author= for organizational authors is permitted. --Izno (talk) 23:25, 13 August 2019 (UTC)
I prefer the creation of new parameters for |publisher1= and |publisher2= and so on. QuackGuru (talk) 23:38, 13 August 2019 (UTC)
I have seen cases of co-publishers, but that is not the case here. Looking at the edit QG links to, it appears to me that the essential problem is citing the "Surgeon General of the United States" as the publisher. It is quite unlikely that the Surgeon General has personally published that item. (I can conceive of the office "of the Surgeon General" doing so, but unlikely.) Someone should take a closer look at the source to sort out who the (possibly "corporate" or institutional) author is, and who actually managed the publication. At any rate, this case does not warrant citing multiple publishers when the real issue is who is the (singular) publisher. And certainly does not warrant multiple publisher parameters. ♦ J. Johnson (JJ) (talk) 23:55, 13 August 2019 (UTC)
It is a report of the SG. Adding "National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health" is fine. But what is the best way to add it? How can I list it as a co-publisher? QuackGuru (talk) 16:19, 14 August 2019 (UTC)
"Report of the SG" is ambiguous as to authorship (responsibility), publisher, etc. You want the "best way to add" something, where I would say it is not clear as to exactly what should be added. (And unless the document says "co-published" I rather doubt that is the case.) What you need to do is examine the document closely, perhaps with the help of a medical librarian. Or look at how other publications cite it. But be cautious. E.g., I would not go with citoid's identification of the author as "General". ♦ J. Johnson (JJ) (talk) 23:22, 14 August 2019 (UTC)
if there is only a single non-human author from citoid, we reject it. This author is from the pubmed API based upon the PMID. AManWithNoPlan (talk) 00:00, 15 August 2019 (UTC)
So is the Surgeon General a non-human author? [Caution! lots of sharp edges in that question; handle with care.] ♦ J. Johnson (JJ) (talk) 19:04, 15 August 2019 (UTC)
See how other publications cite it. For example, see "While the most recent Surgeon General's Report on the "Health Consequences of Smoking"..."[10] QuackGuru (talk) 19:38, 15 August 2019 (UTC)
Isn't that just what I said? ("Or look at how other publications cite it.")
Note that what you just quoted is not a citation. A citation – more precisely, a full citation – has bibliographic details, etc. Which medical journals tend to pare down to what is minimally sufficient (such as leaving off the publisher), but if you search for this report on Google Scholar you should find lots of hits, and quite likely some useful examples.
There is no bot issue here, so I think we're done. ♦ J. Johnson (JJ) (talk) 21:36, 16 August 2019 (UTC)
It is not a bot issue unless there is a new way to format it for organizational authors in the future. For now this is the way to cite it. QuackGuru (talk) 21:42, 16 August 2019 (UTC)
Not quite; cs1|2 has |chapter= and |chapter-url=; use them:
{{cite book |chapter-url=https://stacks.cdc.gov/view/cdc/21569/Share |title=The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General |chapter=Nicotine |date=2014 |pages=107–138 |publisher=[[Surgeon General of the United States]] |pmid=24455788 |archive-url=https://web.archive.org/web/20150915172434/http://www.surgeongeneral.gov/library/reports/50-years-of-progress/sgr50-chap-5.pdf |archive-date=15 September 2015 |author1=National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health}}
National Center for Chronic Disease Prevention Health Promotion (US) Office on Smoking Health (2014). "Nicotine" (PDF). The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Surgeon General of the United States. pp. 107–138. PMID 24455788. Archived from the original on 15 September 2015.
I left |pages=107–138 but do your readers a favor: for in-line citations like this one, use an appropriate in-source location parameter and value to identify where in the source the supporting information is; don't make readers search through 32ish pages to find the the supporting information.
Trappist the monk (talk) 22:06, 16 August 2019 (UTC)
For the page numbers I had to re-format it. QuackGuru (talk) 23:12, 16 August 2019 (UTC)
Four things about that:
  1. SGUS is not an author listed in Safety of electronic cigarettes § Bibliography so readers who might read a printed copy of the article won't be able to find it without a special decoder-ring that tells them that SGUS = National Center for Chronic Disease ...
  2. items in §Bibliography should be listed in alpha order by author
  3. clicking this title link The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General I don't expect to land at "Nicotine". Don't astonish readers.
  4. why is it |url=https://stacks.cdc.gov/view/cdc/21569/Share (not a dead link) but |archive-url=https://web.archive.org/web/20150915172434/http://www.surgeongeneral.gov/library/reports/50-years-of-progress/sgr50-chap-5.pdf? The root url should be the same in both.
Trappist the monk (talk) 23:59, 16 August 2019 (UTC)
SGUS stands for Surgeon General of the United States. They are the publisher. The National Center for Chronic Disease... is a co-publisher/author. I listed it by year. I removed the archived link. The other link has a PDF file. QuackGuru (talk) 00:29, 17 August 2019 (UTC)

Category/batch whitelist[edit]

Category/batch runs are being abused. Possibilities on dealing with this are:

  • a) a whitelist of people allowed to ask for unlimited category/batch runs
    • This could just be something like extended confirmed.
  • b) a whitelist for limited category/batch runs (say ~250 pages at once, tops)
    • This could just be something like autoconfirmed/confirmed.
  • c) a way to kill inappropriate category/batch runs

And have category/batch runs disabled/greatly limited (~25 articles) for non-confirmed/whitelisted users. Headbomb {t · c · p · b} 17:48, 7 August 2019 (UTC)

A/B may also prevent sock puppets and "suspicious" new users that may intend to use the bot in ways that are undesired from doing so. Users without edits or very few edits might not check their edits or won't see possible mistakes by the bot and as such won't report them. Proposal B seems like a good one to go forward with in any case in my opinion. For proposal C it might good to define who could use that option, only maintainers and the operator or also some "trusted" users + we would also need to define what is considered inappropriate. For option A it might also be an idea to let extended confirmed up to 1000 pages, and then have a further whitelist of users who can do unlimited runs ie bureaucrats,administrators and "normal" users who have proven to understand of what the bot does, the impact of extremely large runs (ie don't run during high usage times) and possibly are also actively reporting bugs and joining in discussion here. Just a few things to think about. -Redalert2fan (talk) 20:05, 16 August 2019 (UTC)

Do not add titles with | (or it's HTML representation)[edit]

Status
new bug
Reported by
Jonatan Svensson Glad (talk) 18:50, 9 August 2019 (UTC)
What happens
When converting from bare URL Or when adding title, if |title= includes a & #124;, then nuke it or stip what is behind it.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=User:Josve05a/cite-sandbox&diff=910110345&oldid=910110119
We can't proceed until
Feedback from maintainers


It's often part of the title. AManWithNoPlan (talk) 18:53, 9 August 2019 (UTC)

I've never seen a single place where this was part of a title (and worth keeping; I always strip this). Jonatan Svensson Glad (talk) 18:55, 9 August 2019 (UTC)
Titles with pipes are better than no titles. Headbomb {t · c · p · b} 19:37, 9 August 2019 (UTC)
If done properly, yes. But I rather the bot not make any changes, or only "correct" changes. Not 'better than nothing' chnages, since a human still needs to clean it up. Jonatan Svensson Glad (talk) 21:15, 9 August 2019 (UTC)
Pipes should convert to {{!}}. A literal pipe is a reserved character in CS1|2 and shouldn't exist at all. HTML pipes could ideally be converted to {{!}}. -- GreenC 20:31, 9 August 2019 (UTC)
I would prefer not to add “fixing this” to the bots tasks. AManWithNoPlan (talk) 21:33, 9 August 2019 (UTC)
The website cannot seem to make up its mind about what the better title is. AManWithNoPlan (talk) 21:50, 9 August 2019 (UTC)
<title>One Direction Tour Tickets Sell Out In Minutes | MTV UK</title>
<meta property="twitter:title" content="One Direction Tour Tickets Sell Out In Minutes | MTV UK" /> 
<script type="application/ld+json">{"@context":"http:\/\/schema.org","@type":"NewsArticle","headline":"One Direction Tour Tickets Sell Out In Minutes","url":"http:\/\/www.mtv.co.uk\/one-direction\/news\/one-direction-tour-tickets-sell-out-in-minutes","keywords":["one direction"],"dateCreated":"2013-05-25T12:32:44+01:00","articleSection":"One Direction"}</script>
<meta property="og:title" content="One Direction Tour Tickets Sell Out In Minutes | MTV UK" />
reFill and Citoid give the same title we do. AManWithNoPlan (talk) 21:52, 9 August 2019 (UTC)
there is no reliable way to determine if the after the pipe stuff is part of the title or not. It is actually more of a philosophical question than a factual question. AManWithNoPlan (talk) 18:04, 11 August 2019 (UTC)
I would return {{!}} or the HTML string &#124; rather than avoiding fixing this, whether it's part of a title or elsewhere (except in URLs). --Izno (talk) 18:10, 11 August 2019 (UTC)
Just for the records, Citoid also blindly adds titles with pipes, resulting in stray text without a parameter. At least escaping the pipes as Izno says should be uncontroversial. I have no opinions on stripping them (I often do so manually as Josve says). Nemo 09:18, 12 August 2019 (UTC)
one problem we run into is websites that pipe parts the opposite direction host|section|title AManWithNoPlan (talk) 10:53, 12 August 2019 (UTC)
Converting to {{!}} is all that should happen here. Headbomb {t · c · p · b} 11:44, 12 August 2019 (UTC)
is there any reason to prefer the pseudo template over html? AManWithNoPlan (talk) 11:18, 20 August 2019 (UTC)
It's more wikitextish, but beyond that, no. --Izno (talk) 12:22, 20 August 2019 (UTC)
It's more recognizable in edit window. It's a rather cosmetic and not really a critical issue though. Headbomb {t · c · p · b} 17:00, 20 August 2019 (UTC)

location vs publication-place[edit]

Status
new bug
Reported by
Johannes Schade (talk) 14:51, 10 August 2019 (UTC)
We can't proceed until
Feedback from maintainers


Dear Programmer. Nice tool. I tried your bot on the page Antoine Hamilton. I verified the doi very well. However, it also replaced all the '|publication-place=' parameters in the citation template to '|location=' parameters. My understanding was that publication-place is now old and should be replaced by publication-place. Finally it said it could not find the isbn 9780198613741, which is however the 13-digit version of the isbn 0-19-861374-1 marked in the book. Johannes Schade (talk) 14:51, 10 August 2019 (UTC)

Garbage archive-url cleanup[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 15:26, 14 August 2019 (UTC)
What should happen
[11]
We can't proceed until
Feedback from maintainers


Remove via=domain when adding work for BBC News[edit]

Status
new bug
Reported by
Jonatan Svensson Glad (talk) 21:27, 18 August 2019 (UTC)
What should happen
Remove |via=www.bbc.co.uk when adding |work=BBC News
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=Pentecostalism&diff=prev&oldid=911441255
We can't proceed until
Feedback from maintainers