User talk:Citation bot

From Wikipedia, the free encyclopedia
Jump to: navigation, search


Note that the bot's maintainer can go weeks without logging in to Wikipedia and can no longer devote extensive time to bot maintenance. If a major bug arises and goes unnoticed, it may go unnoticed; as such, important matters may warrant an e-mail. Breaking changes to templates maintained by the bot will be more readily addressed if advance notice can be given.

Please click here to report an error.

This bot is only periodically maintained and new feature requests are no longer being considered. The code is open source and interested parties are invited to assist with the operation and extension of the bot. The source code is at https://github.com/ms609/citation-bot

Standardize and Customize Journal Capitalization[edit]

Status
feature request
Reported by
Saimondo (talk) 16:21, 3 August 2014 (UTC)
Type of bug
Improvement
Actual / expected output
Bot writes for example "Molecular and cellular biology" instead of "Molecular and Cellular Biology" by autofilling with PMID 9858585
Link
https://en.wikipedia.org/w/index.php?title=Template%3ACite_pmid%2F9858585&diff=619550325&oldid=604044373
Replication instructions
autocomplete with PMID 9858585
We can't proceed until
Agreement on the best solution
Requested action from maintainer


Extended content

Data on NCBI seems to be ok: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC83919/ where the Journal is written as "Mol Cell Biol." on the webpage and as "MOLECULAR AND CELLULAR BIOLOGY" in the full text pdf.

What to do in those cases? Include "Molecular and Cellular Biology" in: https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions in sush cases?

The same with

-"The Journal of biological chemistry" e.g. PMID 9858585

-"The Journal of cell biology" e.g. PMID 9763423

an other cases seen in https://en.wikipedia.org/wiki/Special:RecentChangesLinked/Category:Cite_doi_templates ? Thanks--Saimondo (talk) 16:21, 3 August 2014 (UTC)

Actually PubMed lists the journal as "Molecular and cellular biology" in the webpage meta data. A very minor case of GIGO. AManWithNoPlan (talk) 02:31, 4 August 2014 (UTC)
Perhaps its worth quoting the University of Chicago Manual of Style (14th ed.) on this matter:
"In regular title capitalization, also known as headline style, the first and last words and all nouns, pronouns, adjectives, verbs, adverbs, and subordinating conjunctions (if, because, as, that, etc.) are capitalized. Articles (a, an, the), coordinating conjunctions (and, but, or, for, nor), and prepositions, regardless of length are lowercased unless they are the first or last word of the title or subtitle. The to in infinitives is also lowercased."
On the other hand, it is common in library cataloging following MARC format to capitalize only the initial word, proper nouns, and, if the title begins with an article, that article and the following noun.
Wikipedia citations should follow citation style, rather than library cataloging style. In this case, the appropriate form would be "Molecular and Cellular Biology". The Wikipedia Manual of Style provides much the same advice on the capitalization of titles. SteveMcCluskey (talk) 18:40, 4 August 2014 (UTC)
I am not very familiar with PHP (the language that Citation Bot is coded in), but it would appear that there is a mb_convert_case function:
 $str = mb_convert_case($str, MB_CASE_TITLE, "UTF-8");
that can transform a string into title case (i.e., capitalize the first and last words of the title and all nouns, pronouns, adjectives, verbs, adverbs, and subordinating conjunctions). This function would probably work well for most journal names. Boghog (talk) 19:15, 4 August 2014 (UTC)
This should be easy to implement, but I anticipate that some time down the line it will upset someone. Before I implement it, could we establish consensus and file a bot approval request if necessary? Thanks. Martin (Smith609 – Talk) 08:49, 25 August 2014 (UTC)
How about your implement it for adding journal titles, but don't implement it for changing existing entries. Eventually, the list of titles that violate the rules will be built up, and then you can make it is a fix for existing journal titles. AManWithNoPlan (talk) 01:48, 4 September 2014 (UTC)

You are of course right, it´s no error it´s the catalog style NCBI is using. I don´t have the complete overview what capitalization format is obtained by the doi or issn vs pmid queries. But if you use the cite-> templates-> cite journal option here in the edit window and use autofill with the doi:10.1128/MCB.00698-14 you get "Molecular and Cellular Biology" if you use the same publications PMID 25022755 with autofill you get "Molecular and cellular biology". If capitalization means also harmonization I think few wikipedians would be against it.

Furthermore, as far as I understand https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Titles_of_works the capitalization format like above should be ok (I have the impression that most journals use capitalization for their own names on their homepages/pdfs). Should we ask on the Manual of style talk page to see if there´s a consensus for capitalization? In case someone is interested, here is a recent reply of an email I (re-)sent to NCBI some time ago:

"...Standard cataloging requires that the first word in the full journal title begins with an upper case letter and remaining words (except for proper nouns) begin with lower case. Journal title abbreviations begin with all upper-case letters. I checked the XML data for several journals and found that each of the title listed in this manner. You can see several examples at the bottom of this document:

Fact Sheet: Construction of the National Library of Medicine Title Abbreviations http://www.nlm.nih.gov/pubs/factsheets/constructitle.html Sincerely, Ellen M. L. ...

-Original Message-

Dear NCBI Team, in the xml data of a specific article https://www.ncbi.nlm.nih.gov/pubmed/9858585?dopt=Abstract&report=xml&format=text the journal name is written "Molecular and cellular biology" and the abbreviation is "Mol Cell Biol.". I think the correct journal name should be "Molecular and Cellular Biology" as written on the journal homepage http://mcb.asm.org/content/19/1/612.long ." Saimondo (talk) 17:29, 10 September 2014 (UTC)

https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions seems to be being ignored by the latest bot also. AManWithNoPlan (talk) 16:12, 27 September 2015 (UTC)
So, the conclusion is
  1. covert journal names to title case when adding them
  2. Add back in exclusions list support https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions

AManWithNoPlan (talk) 20:43, 2 January 2016 (UTC)

I think the solution is to change add_if_new in objects.php like this:

changing

      case "periodical": case "journal":
        if ($this->blank("journal") && $this->blank("periodical") && $this->blank("work")) {
          return $this->add($param, sanitize_string($value));
        }
        return false;

into

      case "periodical": case "journal":
        if ($this->blank("journal") && $this->blank("periodical") && $this->blank("work")) {
          return $this->add($param, format_title_text(sanitize_string($value)));
        }
        return false;

AManWithNoPlan (talk) 20:58, 6 August 2016 (UTC)

Bot does not handle aliases[edit]

Status
new bug
Reported by
It Is Me Here t / c 11:31, 3 September 2014 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
If an instance of {{cite journal}} has no |issue=φ, the Bot adds it, even if the {{cite journal}} already has |number=φ, throwing up a red error in read mode.
It should do nothing (bypass {{cite journal}}s with |number=φ).
Link
http://en.wikipedia.org/w/index.php?title=Template:Cite_doi/10.2307.2F1477803&diff=623994852&oldid=623994152
We can't proceed until
Agreement on the best solution
Requested action from maintainer



The bot also adds "pages=" when there is already a "p=" or "pp=" or "page=" card. AManWithNoPlan (talk) 21:00, 26 December 2015 (UTC)

...or "at=". Lithopsian (talk) 16:36, 29 December 2015 (UTC)
https://en.wikipedia.org/w/index.php?title=Calutron&type=revision&diff=688482098&oldid=688481880 Hawkeye7 (talk) 07:02, 1 November 2015 (UTC)
Another example of pages= being added when page= already exists: https://en.wikipedia.org/w/index.php?title=Inflow_%28meteorology%29&diff=next&oldid=698774275
Issue and number too: https://en.wikipedia.org/w/index.php?title=Ferruccio_Busoni&diff=prev&oldid=724431335 AManWithNoPlan (talk) 20:04, 16 July 2016 (UTC)
Here is hoses things by replacing a page range, with a page beginning https://en.wikipedia.org/w/index.php?title=History_of_Kentucky&diff=731670614&oldid=731670055 Stevie is the man! TalkWork

The solution is to edit objects.php in the functions add_if_new() adding the needed things such as changing

        if (( $this->blank("pages") && $this->blank("page"))

into

        if (( $this->blank("pages") && $this->blank("page")  && $this->blank("pp")  && $this->blank("p"))

Also will need to add some, like this:

      case 'issue':
        if ($this->blank("issue") && $this->blank("number")) {        
          return $this->add($param, $value);
        } 
      return false;

since they are caught in the catch all:

      default:
        if ($this->blank($param)) {        
          return $this->add($param, sanitize_string($value));
        }
    }

AManWithNoPlan (talk) 15:06, 9 August 2016 (UTC)

See this diff, which results in a slew of citation errors for having both pages and pp, and note that in many of those entries, it munges the page range into an (inaccurate) single page. Squeamish Ossifrage (talk) 13:18, 18 October 2016 (UTC)

this is really a bug in the citation templates for allowing a bazillion different ways to say the same thing, but the bot needs to deal with it. The code to fix it is in the git repository. No one with the power upload it to the wmflabs has done so. So, it's also a bug in us meat bags too. AManWithNoPlan (talk) 14:25, 18 October 2016 (UTC)

Bot is running but bug is not fixed. [1] Bot must be shut down until bugs can be fixed. Hawkeye7 (talk) 06:53, 5 April 2017 (UTC)

Anyone with the power to stop the bot probably has the power to upload the fixes. AManWithNoPlan (talk) 15:16, 5 April 2017 (UTC)

issue vs. volume confusion for journals with no volumes[edit]

Status
feature request
Reported by
All the best: Rich Farmbrough01:38, 11 November 2014 (UTC).
Type of bug
Inconvenience
Actual / expected output
for the journal ZookKeys changes the issue number to a volume number.
Should understand that this number is an issue number with this particular journal
Link
https://en.wikipedia.org/w/index.php?title=Aegista_diversifamilia&diff=630393100&oldid=629974617 - see discussion here
Replication instructions
A similar ZooKeys doi template
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer
Build in specific knowledge of this journal's numbering scheme. Possibly a list of one, unless and until other similar items are found.


http://search.crossref.org/?q=10.3897/zookeys.445.7778 The cross-ref data is wrong. So, it is not a bot bug, but the bot could easily fix it. AManWithNoPlan (talk) 19:15, 2 October 2015 (UTC)

The bot need to add special code for journals like this. And then internally store a list of of such journals. AManWithNoPlan (talk) 00:13, 3 January 2016 (UTC)

The solution is to add code to objects.php in the public function add_if_new($param, $value) AManWithNoPlan (talk) 02:10, 7 August 2016 (UTC)

      case 'volume':
        if ($this->blank($param)) {
          if (  $this->get('journal') == "ZookKeys" ) add_if_new('issue',$value) ; // This journal has no volume
          return $this->add($param, $value);
        }
      return false;

And change this code:

        if ($this->blank("journal") && $this->blank("periodical") && $this->blank("work")) {
          return $this->add($param, sanitize_string($value));
        }

to

        if ($this->blank("journal") && $this->blank("periodical") && $this->blank("work")) {
          if ( sanitize_string($value) == "ZooKeys" ) $this->blank("volume") ; // No volumes, just issues.
          return $this->add($param, sanitize_string($value));
        }

Might be best long term to have a global array of such journals rather than having to keep adding them one by one.

Edits citations inside of nowiki tags[edit]

Status
new bug
Reported by
Izno (talk) 20:55, 29 April 2015 (UTC)
Type of bug
Inconvenience
Actual / expected output
The bot removed an accessdate from a citation without a URL (correctly) where the citation was used an example (and in this case happened to be wrapped in <nowiki>...</nowiki>.
I'm not sure, but I think my suggestion is that the bot should not touch citations inside <nowiki>...</nowiki>.
Link
//en.wikipedia.org/w/index.php?title=Help_talk:Citation_Style_1&curid=34112310&diff=659936244&oldid=659925010
We can't proceed until
Agreement on the best solution
Requested action from maintainer


The solution is to deal with this at the same time that the code escapes out comments AManWithNoPlan (talk) 04:42, 6 August 2016 (UTC) In objects.php add these lines right after equivalent comment lines:

    $comments = $this->extract_object(Comment);
    $nowiki   = $this->extract_object(Nowiki);

    $this->replace_object($comments);
    $this->replace_object($nowiki);


class Comment extends Item {
  const placeholder_text = '# # # Citation bot : comment placeholder %s # # #';
  const regexp = '~<!--.*-->~us';  // Note from AManWithNoPlan:  this regex is wrong---it is greedy: see other bot bugs on this talk page
  const treat_identical_separately = FALSE;
  
  public function parse_text($text) {
    $this->rawtext = $text;
  }
  
  public function parsed_text() {
    return $this->rawtext;
  }
}
class Nowiki extends Item {
  const placeholder_text = '# # # Citation bot : no wiki placeholder %s # # #';  // Have space in nowiki so that it does not through some crazy bug match itself recursively
  const regexp = '~<nowiki>.*?</nowiki>~us'; 
  const treat_identical_separately = FALSE;
  
  public function parse_text($text) {
    $this->rawtext = $text;
  }
  
  public function parsed_text() {
    return $this->rawtext;
  }
}

AManWithNoPlan (talk) 16:08, 9 August 2016 (UTC)

Duplicating jstor[edit]

Status
new bug
Reported by
Frietjes (talk) 14:11, 23 May 2015 (UTC)
Type of bug
Deleterious
Actual / expected output
Bot replaces a jstor url with a jstor parameter, but does not check to see if there is already a jstor parameter in the citation. hence, if there is already a blank jstor parameter, the jstor link is effectively deleted.
Bot should first remove the empty jstor parameter, and/or any completely duplicate jstor parameters (i.e., jstor parameters with the exact same value).
Link
http://en.wikipedia.org/w/index.php?title=Noye's_Fludde&type=revision&diff=663532320&oldid=637085644
Replication instructions
create a citation with both a jstor url and a jstor parameter in the citation template
We can't proceed until
Requested action from maintainer



The bad code are these lines of get_identifiers_from_url() in objects.php:

          $this->rename("url", "jstor", $match[1]);

          $this->rename("url", "bibcode", urldecode($bibcode[1]));

          $this->rename("url", "pmc", $match[1] . $match[2]);

            $this->rename('url', 'asin', $match['id']);

They should match the doi code, which is a forget followed by a set:

          $this->forget('url');
          $this->set("doi", urldecode($match[1]));

I can't explain why one works and the other does not, but that is what happens. AManWithNoPlan (talk) 03:13, 9 August 2016 (UTC)

Special characters in data need escaped[edit]

Status
feature request
Reported by
Jonesey95 (talk) 03:42, 22 September 2015 (UTC)
Type of bug
Inconvenience
Actual / expected output
Link
https://en.wikipedia.org/w/index.php?title=Latamoxef&type=revision&diff=682190504&oldid=682190396
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


This is a pretty obscure bug, but if someone wanted to fix it, they could run the title through a regex to look for "[[" and replace it with "[<!-- -->[" (as was done on that article). Kaldari (talk) 20:56, 22 September 2015 (UTC)

And pipes too: https://en.wikipedia.org/w/index.php?title=User%3AJonesey95%2Fsandbox2&diff=prev&oldid=694077824

The problem is that the source of the metadata, http://adsabs.harvard.edu/abs/1991bsc..book.....H, has a vbar within an author's name, I think erroneously as the author in question doesn't use a middle name or initial, and the bot doesn't recognize it and quote it to prevent it becoming a parameter delimiter. So I think there are really two issues here: (1) bad data elsewhere that we can't do much about, and (2) better bot handing of special characters in external data. —David Eppstein (talk) 21:39, 6 December 2015 (UTC)
I have added a diff in the bug description above. When vertical bars occur in URLs, replace each vertical bar with %7c. When vertical bars occur in parameter values that are not URLs, replace each vertical bar with &#124;. – Jonesey95 (talk) 23:46, 6 December 2015 (UTC)
Yes that's it. Sounds like a sensible solution. I've not seen one of these where the vertical bar is anything other than a mistake, but I suppose it is possible in some cases. Even for a mistake, it is perhaps best for the bot to keep the character, without breaking the formatting, and someone to take it out by hand if it is really obnoxious. Lithopsian (talk) 12:25, 7 December 2015 (UTC)
Sometimes for news site or web site sources, the pipe character or spaced dash may come up in |title= values, where it should really be treated as a field delimiter between title and publisher. I'm not sure if citationbot checks for that, but certainly there are some other tools that are getting it wrong. It would be good if citationbot caught and corrected those errors, rather than just converting the character to have a less-obvious error. LeadSongDog come howl! 17:06, 7 December 2015 (UTC)

Need to add the second line here in expandFns.php AManWithNoPlan (talk) 15:22, 9 August 2016 (UTC)

function format_title_text($title) {
   $title = sanitize_string($title)

also in object.php need to do a lot of changing this:

          return $this->add($param, $value);

to this:

          return $this->add($param, sanitize_string($value));

within these areas:

      case "editor": case "editor-last": case "editor-first":
 .............
      case "first90": case "first91": case "first92": case "first93": case "first94": case "first95": case "first96": case "first97": case "first98": case "first99":

Google books data is sometimes rubbish[edit]

Status
new bug
Reported by
Jonesey95 (talk) 04:42, 23 September 2015 (UTC)
Type of bug
Inconvenience
Actual / expected output
Bot puts journal name into title=
Bot should put journal name into journal=
Link
https://en.wikipedia.org/w/index.php?title=Ataye_River&type=revision&diff=682349962&oldid=545633253
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


Also: https://en.wikipedia.org/w/index.php?title=Homing_pigeon&diff=prev&oldid=682284024

the bot thinks it can interpret Google Books metadata, and fails badly for journal articles that are published within journal issues listed as books by Google Books. —David Eppstein (talk) 04:56, 23 September 2015 (UTC)
(EC) I think you have to propose a solution if you want this fixed - the bot took the "title" from the Google books link, which is generally appropriate. Example of solution: ask the bot to leave the title untouched IF the template type is "cite journal" AND the url contains "books.google" AND the citation is not retrievable through crossref/pmid/etc databases, but still fix the title if the template is "cite book"? (I admit this criterion is somewhat too complex.). Materialscientist (talk) 04:57, 23 September 2015 (UTC)
In my experience the metadata at Google books is too unreliable to ever use without human intervention. It's often a good starting point, but it regularly does things like replacing the actual publisher name with the name of a business entity that later bought the publisher, using publication years that are much later than the actual publisher, mangling author names, listing minor contributors (e.g. the author of a preface) as the author of a whole book, listing multiple book series for a book only one of which is correct, listing publisher names as authors and author names as publishers, filling in the "edition" field with descriptive text instead of the edition number, listing only one author or editor for a book that has more than one, etc. —David Eppstein (talk) 05:33, 23 September 2015 (UTC)
Yes. I think we should avoid any automated, or even semi-automated, any extractions from Google metadata. Even having a human pass on such extractions is too slack, as, at best, such data is in no way authoritative, and suitable only as hints for further research. ~ J. Johnson (JJ) (talk) 21:51, 23 September 2015 (UTC)
I think manual extractions are ok as long as they are doublechecked against either the preview or a hardcopy. And editors who don't have a preview or a hardcopy shouldn't be adding the citation at all. But the bot can't do any of that, it can only copy what Google already has wrong, and that's not good enough. —David Eppstein (talk) 03:04, 30 September 2015 (UTC)
In such cases we are not doublechecking the metadata; we're using it to find an authoritative instance from which to extract the data directly. At any rate, I think we are agreed that a bot should not be making any changes or additions based on the Google metadata. ~ J. Johnson (JJ) (talk) 22:01, 2 October 2015 (UTC)


In objects.php AManWithNoPlan (talk) 02:49, 7 August 2016 (UTC) Change

      foreach ($xml->dc___creator as $author) {
        $this->add_if_new("author" . ++$i, formatAuthor(str_replace("___", ":", $author)));
      }

to:

      foreach ($xml->dc___creator as $author) {
        if( $author != "Hearst Magazines" ) {  // Catch common google bad authors
           $this->add_if_new("author" . ++$i, formatAuthor(str_replace("___", ":", $author)));
        }
      }

Unknown is not a journal name[edit]

Status
feature request
Reported by
(tJosve05a (c) 06:51, 24 September 2015 (UTC)
Type of bug
Inconvenience
Actual / expected output
Link
https://en.wikipedia.org/w/index.php?title=Digital_object_identifier&diff=prev&oldid=682510640
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


In this case it looks like bad data at ADS rather than the bot's fault. —David Eppstein (talk) 06:56, 24 September 2015 (UTC)
Yes, but I think that the bot can have one line of code that refuses to add a journal name that is unknown. AManWithNoPlan (talk) 15:22, 1 October 2015 (UTC)
I think this fix is needed in objects.php is second line and fourth line AManWithNoPlan (talk) 20:59, 6 August 2016 (UTC)
        $this->add_if_new("bibcode", (string) $xml->record->bibcode);
        if ( strcasecmp( (string) $xml->record->bibcode ), "unknown") )  {  // Returns zero if the same
        $this->add_if_new("title", (string) $xml->record->title);
        }

Erroneously reports DOI as broken[edit]

Status
new bug
Reported by
AManWithNoPlan (talk) 00:42, 18 November 2015 (UTC)
Type of bug
Improvement:
Actual / expected output
marks a DOI as invalid even if it works if there is no crossref entry
We can't proceed until
Agreement on the best solution
Requested action from maintainer
Only mark DOI invalid if dx.doi.org also fails


I thought this was fixed and marked it as so. Currently, doi is flagged as invalid if crossref fails, which is reasonable, but need to also check is dx.doi.org also failed AManWithNoPlan (talk) 00:42, 18 November 2015 (UTC)

I encounter this bug quite often and find it annoying, because in my naive thinking it should be easy to make the bot check the dx.doi.org/xxx link for a "broken" doi. A fresh example: run doi bot on Africanized bee, it will mark doi:10.3265/Nefrologia.pre2010.May.10269 as inactive. Materialscientist (talk) 03:34, 5 February 2016 (UTC)
Maybe the solution is change this code. I think this code only adds broken date if there is no re-direct information in dx.doi.org headers (lack of redirect implies dead doi):
        $this->add_if_new('doi_brokendate', date('Y-m-d'));

to:

        $url_test = "http://dx.doi.org/".$doi ;
        $headers_test = get_headers($url_test, 1);
        if(empty($headers_test['Location']))
                $this->add_if_new('doi_brokendate', date('Y-m-d'));

and change this code:

      $this->set("doi_brokendate", date("Y-m-d"));

to:

        $url_test = "http://dx.doi.org/".$doi ;
        $headers_test = get_headers($url_test, 1);
        if(empty($headers_test['Location']))
              $this->set("doi_brokendate", date("Y-m-d"));

AManWithNoPlan (talk) 16:28, 9 August 2016 (UTC)

Is this the same bug? Another editor reverted before I could act, but I checked and the doi is not broken at all. Hawkeye7 (talk) 22:25, 4 October 2016 (UTC)

Hard to tell. It works now. Probably a transient cross-ref failure. AManWithNoPlan (talk) 23:49, 4 October 2016 (UTC)

Bot created arXiv= parameter error[edit]

Status
new bug
Reported by
Jonesey95 (talk) 03:56, 7 December 2015 (UTC)
Type of bug
Inconvenience
Actual / expected output
Bot changed a valid |eprint= parameter into an invalid one by removing the class
Bot should leave valid parameters alone
Link
Search for 0508091 in this diff
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer
Modify code to match {{cite arxiv}}


The bot removed the class portion of the arXiv parameter value in {{cite arxiv}}. It should not have done so. There are two kinds of arXiv parameters, explained in the documentation as follows:

  • arxiv or eprint (Mandatory): arXiv/Eprint identifier, without any "arXiv:" prefix. Prior to April 2007, the identifiers included a classification, an optional two-letter subdivision, and a 7-digit YYMMNNN year, month, and sequence number of submission in that category. E.g. gr-qc/0610068 or math.GT/0309136. After April 2007, the format was changed to a simple YYMM.NNNN. Starting in January 2015, the identifier was changed to be 5 digits: YYMM.NNNNN.
  • class: arXiv classification, e.g. hep-th. Optional. To be used only with new-style (2007 and later) eprint identifiers that do not include the classification.

The bot should not modify valid |arxiv= or |eprint= parameters. – Jonesey95 (talk) 03:56, 7 December 2015 (UTC)

Here's a minimal diff showing this problem. Lithopsian (talk) 00:02, 29 December 2015 (UTC)
This is still happening. – Jonesey95 (talk) 13:30, 27 July 2016 (UTC)

Here is an example of one that gets broken. {{cite arXiv|eprint=astro-ph/0409583 | title = Exploring the Divisions and Overlap between AGB and Super-AGB Stars and Supernovae | last1 = Eldridge | first1 = J. J. | last2 = Tout | first2 = C. A.|class=astro-ph|date=2004 }} AManWithNoPlan (talk) 15:49, 9 August 2016 (UTC)

Here is the offending source code from objects.php:

    $eprint = str_ireplace("arXiv:", "", $this->get('eprint') . $this->get('arxiv'));
    if ($class && substr($eprint, 0, strlen($class) + 1) == $class . '/')
      $eprint = substr($eprint, strlen($class) + 1);
    $this->set($arxiv_param, $eprint);

that should be:

    $eprint = str_ireplace("arXiv:", "", $this->get('eprint') . $this->get('arxiv'));
    //if ($class && substr($eprint, 0, strlen($class) + 1) == $class . '/')
    //  $eprint = substr($eprint, strlen($class) + 1);
    $this->set($arxiv_param, $eprint);

AManWithNoPlan (talk) 15:56, 9 August 2016 (UTC)

This only occurs if class is set AManWithNoPlan (talk) 00:26, 14 October 2016 (UTC)

Link at top of results page leads to error[edit]

Status
new bug
Reported by
Lithopsian (talk) 13:27, 26 December 2015 (UTC)
Type of bug
Cosmetic
Actual / expected output
After expanding citations for a page containing spaces in the title, the results page shows a link to the article at the top and bottom of the page. The link at the top does not lead to the article, but to an error page.
We can't proceed until
Agreement on the best solution
Requested action from maintainer


This code in objects.php :

    quiet_echo ("\n<hr>[" . date("H:i:s") . "] Processing page '<a href='http://en.wikipedia.org/wiki/" . addslashes($this->title) . "' style='text-weight:bold;'>{$this->title}</a>' &mdash; <a href='http://en.wikipedia.org/?title=". addslashes(urlencode($this->title))."&action=edit' style='text-weight:bold;'>edit</a>&mdash;<a href='http://en.wikipedia.org/?title=" . addslashes(urlencode($this->title)) . "&action=history' style='text-weight:bold;'>history</a> <script type='text/javascript'>document.title=\"Citation bot: '" . str_replace("+", " ", urlencode($this->title)) ."'\";</script>");

needs changed to

    quiet_echo ("\n<hr>[" . date("H:i:s") . "] Processing page '<a href='http://en.wikipedia.org/?title=" . addslashes($this->title) . "' style='text-weight:bold;'>{$this->title}</a>' &mdash; <a href='http://en.wikipedia.org/?title=". addslashes(urlencode($this->title))."&action=edit' style='text-weight:bold;'>edit</a>&mdash;<a href='http://en.wikipedia.org/?title=" . addslashes(urlencode($this->title)) . "&action=history' style='text-weight:bold;'>history</a> <script type='text/javascript'>document.title=\"Citation bot: '" . str_replace("+", " ", urlencode($this->title)) ."'\";</script>");

AManWithNoPlan (talk) 21:10, 6 August 2016 (UTC)

Error converting url to arxiv parameter[edit]

Status
new bug
Reported by
Hawkeye7 (talk) 21:48, 27 December 2015 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
A Bot inserted an arxiv= into a template on Metallurgical Laboratory creating a red "Check |arxiv= value" error message. Corrected by removing the ".pdf" from the end of the arxiv. see https://en.wikipedia.org/w/index.php?title=Metallurgical_Laboratory&type=revision&diff=697034978&oldid=697034297
We can't proceed until
A specific edit to the bot's code is requested below.
Requested action from maintainer


Just need to strip the .pdf off of url when converting url to eprint. Super easy code change. AManWithNoPlan (talk) 19:19, 9 January 2016 (UTC)

Change in objects.php

        $this->add_if_new("arxiv", $match[1]);
        if (strpos($this->name, 'web')) $this->name = 'Cite arxiv';

to

        $match[1] = str_replace ( ".pdf" , "" , $match[1] )
        $this->add_if_new("arxiv", $match[1]);
        if (strpos($this->name, 'web')) $this->name = 'Cite arxiv';

and change this:

    return "{{Cite arxiv | eprint={$match[1]} }}";

to:

    $match[1] = str_replace ( ".pdf" , "" , $match[1] )
    return "{{Cite arxiv | eprint={$match[1]} }}";

JSTOR plant link mistaken for journal[edit]

Status
new bug
Reported by
Josh Milburn (talk) 15:10, 7 February 2016 (UTC)
Type of bug
Deleterious
Actual / expected output
The bot is changing a JSTOR link to the JSTOR Global Plants project to an unrelated link to a JSTOR journal article. It falsely believes that the "JSTOR=" link on {{cite journal}} (admittedly, this is probably not the template which should have been used in the article) can be used in this case, when it cannot, as the citation is to a different part of the JSTOR website.
Link
https://en.wikipedia.org/w/index.php?title=Persoonia_terminalis&type=revision&diff=703656019&oldid=703655944
We can't proceed until
Agreement on the best solution
Requested action from maintainer
detect plant jstor urls and ignore


That's annoying that JSTOR has chosen to add a new type of stable link (although it does start with plant) AManWithNoPlan (talk) 19:21, 7 February 2016 (UTC)

The fix needs put in objects.php the third through fifth lines

      if (strpos($url, "sici")) {
        #Skip.  We can't do anything more with the SICI, unfortunately.
      elseif (strpos($url, "plants")) {
        #Skip.  We can't do anything more with the plants, unfortunately.
      }  else

AManWithNoPlan (talk) 21:00, 6 August 2016 (UTC)

citing using pmid creates author1 instead of last1[edit]

Status
improvement
Reported by
Ihaveacatonmydesk (talk) 21:33, 30 May 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
{{cite journal|pmid=12858711 |year=2003 |author1=Lovallo |first1=D |title=Delusions of success. How optimism undermines executives' decisions |journal=Harvard business review |volume=81 |issue=7 |pages=56–63, 117 |last2=Kahneman |first2=D }}
{{cite journal|pmid=12858711 |year=2003 |last1=Lovallo |first1=D |title=Delusions of success. How optimism undermines executives' decisions |journal=Harvard business review |volume=81 |issue=7 |pages=56–63, 117 |last2=Kahneman |first2=D }}
Replication instructions
use a pmid an click the button to autocomplete - also does the same thing when inputting a url into cite book, like {{cite book|url=https://books.google.com/?id=FI7l8O1tlkkC}}
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


|author1= is an alias of |last1=. This would be a cosmetic fix (in the code) only. – Jonesey95 (talk) 22:55, 31 May 2016 (UTC)

Agreed, but since it's such a simple fix it would be a shame not to do it. Also I actively search for "author" when most of the refs are |lastn=/|firstn= to edit them for consistency, and that creates false positives. Ihaveacatonmydesk (talk) 08:28, 1 June 2016 (UTC)
I think the solution is to change this code in DOItools.php
    foreach ($authors as $no => $auth) {
      $names = explode (', ', $auth);
      $newp["author" . ($no + 1)] = $names[0];
      $newp["first" . ($no + 1)] = $names[1];
    }

to this

    foreach ($authors as $no => $auth) {
      $names = explode (', ', $auth);
      $newp["last" . ($no + 1)] = $names[0];
      $newp["first" . ($no + 1)] = $names[1];
    }

AManWithNoPlan (talk) 21:00, 6 August 2016 (UTC)

When bibcodes ends with a dot, it leaves the dot out[edit]

Status
new bug
Reported by
Headbomb {talk / contribs / physics / books} 17:53, 19 June 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
When bibcodes ends with a dot, it leaves the dot out (2010Natur.464...59)
The bot should retrieve the full 19-character bibcode (2010Natur.464...59.)
Link
[2] [search for the string "| bibcode = 2010Natur.464...59" in the diff]
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


I think the solution is to modify objects.php to add a special case for bibcodes, to sit above the catch all code:

      default:
        if ($this->blank($param)) {        
          return $this->add($param, sanitize_string($value));
        }

such as:

      case 'bibcode':
        if ($this->blank($param)) { 
          $bibcode_pad =  strlen($value) - 19;
          if($bibcode_pad > 0 ) {  // Paranoid, don't want a negative value, if bibcodes get longer
              value = $value . str_repeat( ".", $bibcode_pad);  // Add back on trailing periods
          }
          return $this->add($param, $value);
        } 
      return false;

AManWithNoPlan (talk) 21:34, 6 August 2016 (UTC)

Here's another diff showing this bug. – Jonesey95 (talk) 16:17, 10 August 2016 (UTC)
And another diff showing this bug. GoingBatty (talk) 13:38, 19 August 2016 (UTC)

Removes access-date when chapter URLs specified[edit]

Status
new bug
Reported by
Dhtwiki (talk) 05:12, 20 June 2016 (UTC)
Type of bug
Improvement
Actual / expected output
Link
https://en.wikipedia.org/w/index.php?title=Earth&action=historysubmit&type=revision&diff=726104484&oldid=725978381
We can't proceed until
Agreement on the best solution
Requested action from maintainer
notice |chapter-url=


The bug is that the bot does not seem to notice that |chapter-url= could have access date. The bot is looking for |url=. AManWithNoPlan (talk) 19:59, 16 July 2016 (UTC)

The solutions is to change the code in objects.php from:

    if ($this->has('accessdate') && $this->lacks('url')) $this->forget('accessdate');

To:

    if ($this->has('accessdate') && $this->lacks('url') && $this->lacks('chapter-url')  && $this->lacks('chapterurl')  $this->lacks('contribution-url') && $this->lacks('contributionurl')) $this->forget('accessdate');

AManWithNoPlan (talk) 21:15, 6 August 2016 (UTC)

That bug doesn't appear to have been fixed. Schwede66 04:20, 8 February 2017 (UTC)
No one seems willing or able to grab the new code from github and upload it to the dev or normal version. As sure as I am about all of my fixes that are on github, I think development version is best. Also, my second merge request to git never was accepted. We need someone with tool server power. AManWithNoPlan (talk) 16:56, 8 February 2017 (UTC)

I just ran into this bug today here. Stevie is the man! TalkWork 18:22, 23 February 2017 (UTC)

Here as well. J947 18:08, 19 March 2017 (UTC)
Someone needs to upload latest git version to the wiki development version(with the unmerged git pull at https://github.com/ms609/citation-bot/pulls) and let us test it out. Then someone needs to upload it to the mainline version. We need an administrator person. AManWithNoPlan (talk) 17:36, 20 March 2017 (UTC)

Yet another diff showing this bug. TheDragonFire (talk) 04:30, 26 June 2017 (UTC)

And another. SounderBruce 23:24, 30 June 2017 (UTC)

Public notice[edit]

All bug fixes reported on this page as of this moment have been submitted to github for inclusion in the bot code base. Hopefully, they will soon be operational on wikipedia, and we can close almost all these bugs. AManWithNoPlan (talk) 16:05, 25 August 2016 (UTC)

Wow, great work! Let us know when they are included so that we can test these bugs. – Jonesey95 (talk) 16:18, 25 August 2016 (UTC)
Source code merged into github. Next steps are updating dev bot, testing, fixing any bugs, and finally making it the default bot. AManWithNoPlan (talk) 15:51, 5 September 2016 (UTC)
oh yeah. Not that I can upload the code to Wikipedia. Also, does the development version even work? AManWithNoPlan (talk) 01:46, 6 September 2016 (UTC)
Fhocutt (WMF) Jonesey95 Can some one upload code to wmflabs? AManWithNoPlan (talk) 14:54, 26 September 2016 (UTC)
If you have not yet tried sending an e-mail to Smith609, the bot's owner, that is the next step at this point. He is not on WP much these days but has responded to my e-mails in the past. – Jonesey95 (talk) 20:46, 26 September 2016 (UTC)
The email is now sent. Probably someone else could do it, if they had write privileges to the wiki wmflabs. AManWithNoPlan (talk) 03:03, 27 September 2016 (UTC)
Has the updates been rolled out yet, or is this still in the pipeline? If so, ETA? Headbomb {talk / contribs / physics / books} 12:52, 5 October 2016 (UTC)

Someone with write privileges to wmflabs needs to do it. Preferably to dev version first. No eta or clue. AManWithNoPlan (talk) 12:54, 5 October 2016 (UTC)

Some bugs are fixed and uploaded. Some not fixed. I wonder if it was a partial upload or if, my fixes did not actually fix the bug. The code is not 100% obvious. AManWithNoPlan (talk) 15:40, 12 October 2016 (UTC)

The archiving of this talk page is now fixed too. AManWithNoPlan (talk) 14:30, 17 October 2016 (UTC)
Someone with wmflabs power needs to do the upload of the rest of the last change: https://github.com/ms609/citation-bot/commit/199babcc3f0d6581638c1ee2e2fbf20dba679378 Maybe to the developement version first, just to verify that I did not introduce a horrible bug. AManWithNoPlan (talk) 14:30, 17 October 2016 (UTC)

Comments cause trouble[edit]

Status
new bug
Reported by
Jonesey95 (talk) 02:54, 9 November 2014 (UTC) & 2 years later, Wikid77 (talk) 14:29, 30 September 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
Bot changed |publisher= to |DUPLICATE_publisher= in the absence of a duplicate publisher parameter
Bot should not do that.
Link
https://en.wikipedia.org/w/index.php?title=Fathima_Beevi&diff=629715024&oldid=610463414
Replication instructions
Two years later, on 29 September 2016, Citation_bot still confused by comment "<!-- -->" and put DUPLICATE_title & DUPLICATE_url when only one title/url, in page "Mary Babnik Brown" (dif135). -Wikid77 (talk) 14:29, 30 September 2016 (UTC)
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


As far as I can tell, there were no duplicated parameters when the bot did its edit. – Jonesey95 (talk) 02:54, 9 November 2014 (UTC)

How did you get this? The bot is not currently working.--Auric talk 13:49, 9 November 2014 (UTC)
The edit is date-stamped 15 October 2014. I just discovered it yesterday while going through Category:Pages with citations using unsupported parameters. – Jonesey95 (talk) 15:48, 9 November 2014 (UTC)
Here's another similar one, adding DUPLICATE to |archiveurl= and |archivedate=. – Jonesey95 (talk) 19:53, 10 November 2014 (UTC)
This looks like it related to comments in the references in all cases. This appears to be a common thread in bot bugs on this page. AManWithNoPlan (talk) 04:45, 1 February 2015 (UTC)

Adding bogus |year= https://en.wikipedia.org/w/index.php?title=Wealden_Line&diff=629805699&oldid=629545497

DUPLICATE_ added: https://en.wikipedia.org/w/index.php?title=509th_Composite_Group&diff=636859536&oldid=636220208

DUPLICATE_ added: https://en.wikipedia.org/w/index.php?title=Shapley%E2%80%93Folkman_lemma&diff=655089982&oldid=651991293

This bug appears to still be present in the current version, as of this date stamp. Pinging Fhocutt (WMF). – Jonesey95 (talk) 03:46, 22 September 2015 (UTC)
Give it another try? I tested the dev version (now the actual version) on testwiki and it didn't add DUPLICATE: https://test.wikipedia.org/w/index.php?title=User%3AFhocutt_%28WMF%29%2FCitation_bot_test&type=revision&diff=243602&oldid=243601 . --Fhocutt (WMF) (talk) 23:04, 9 October 2015 (UTC)
It's still doing it here on en.WP. – Jonesey95 (talk) 23:15, 9 October 2015 (UTC)
Here is a very simple reproducer

{{cite book|publisher=Europa<!-- -->}}{{cite news<!-- -->|publisher=The}}

Here are a variety of lines from the bot source code (i might have missed one)

  const regexp = '~<!--.*-->~us';
  $comment_regexp = "~(<!--.*?)\|(.*?-->)~";
  while(preg_match("~<!--.*?-->~", $c, $match)) {
  if (preg_match_all("~<!--[\s\S]*?-->~", $page_code, $match)) {

I think the problem is the first one. It is greedy. The .* needs to be .*? like number three. AManWithNoPlan (talk) 20:41, 7 August 2016 (UTC)

Some bibcode data is crap and bot should do better job dealing with it[edit]

Status
new bug
Reported by
David Eppstein (talk) 08:30, 4 December 2016 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
See Special:Diff/752907744. In the "Handbook of massive data sets" citation (a chapter in a book), the bot found and added a bibcode referring to the whole book (not particularly useful, but not harmful either) but then added a bogus journal= parameter for the book title, causing the citation template to fail to display the chapter title (because they are not allowed for journal citations). In the "Modern hierarchical, agglomerative clustering algorithms" (an arXiv preprint), the bot found and added a bibcode for the preprint (redundant, but not harmful) but then added bogus pages= and volume= parameters derived from the preprint number, causing malformatting of the reference and requiring human intervention to remove the bad parameters.
The bot should, at the very least, recognize that the citation template already contains a contribution= or chapter= parameter that would be incompatible with adding a journal= parameter, and not change the citation in a way that would cause it to have incompatible parameters. More accurately recognizing these types of citations, or recognizing that the data coming from the bibcode doesn't match the type of citation, would be better.
We can't proceed until
Agreement on the best solution
Requested action from maintainer


The BibCode people now are importing all the ARXIV stuff in bulk. We need to add code to not add journals of "eprint arXiv:1109.2377" type. The BibCode people are using the "Publication" field for that now, which in the past was always the journal name. arXiv:1109.2377 yield this on the bibcode site: http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2011arXiv1109.2378M&data_type=BIBTEX&db_key=PRE&nocookieset=1 AManWithNoPlan (talk) 21:38, 4 December 2016 (UTC)

Just for the record, the section title here is not mine and is stated more directly than I would have. But I agree with the sentiment: in situations like this where we know that an outside source has bad metadata, we should not blindly import the same low quality into our own metadata. —David Eppstein (talk) 17:48, 7 December 2016 (UTC)
I take credit for section title AManWithNoPlan (talk) 23:47, 7 December 2016 (UTC)

Google data is not always right, and the bot is not telepathic[edit]

Status
new bug
Reported by
Stevie is the man! TalkWork 15:57, 22 January 2017 (UTC)
Type of bug
Inconvenience
Actual / expected output
1) the first cite change sets params to "|author1=Inc |first1=Time"; 2) the third cite change sets params to "|author1=Friedwald|first1=Will|date=2010-11-02"
1) should be something like "|author1=Time Inc." or perhaps don't have an author; 2) should be "|last1=Friedwald|first1=Will|date=November 2, 2010" (the date part didn't respect {{Use mdy dates}}
Link
diff
We can't proceed until
Requested action from maintainer


The date is grabbed from Google and not massaged at all. AManWithNoPlan (talk) 00:40, 23 January 2017 (UTC)

URL in the website field instead of the URL field (common newbie error)[edit]

Status
feature request
Reported by
Kerry (talk) 06:48, 1 March 2017 (UTC)
Type of bug
Deleterious: Human-input data is deleted
Actual / expected output
the bot is removing the accessdate from citations saying "Removed accessdate with no specified URL"

when the citation does contain a URL but it is in the website field (a common mistake made by newbies, especially those who don't understand the jargon "URL" -- in my experience of doing training in public libraries, many people call these "web addresses" and not "URL")
Ideally. If a citation has a URL in the website field and the URL field is empty, move the URL into the correct field and empty the website field. If that's not possible for the bot to do, then don't delete the accessdate, but try and warn in some way. (In a super-ideal world, the editor software would not use the term URL but say "address of web page", but I assume this is out of scope here).

Link
[3]
Replication instructions
undo it and run the bot again
We can't proceed until
Agreement on the best solution
Requested action from maintainer


This is probably the wrong bot to use. I should note that the access date that is deleted is not actually shown to humans. AManWithNoPlan (talk) 15:05, 27 April 2017 (UTC)

Bot generated invalid cite data "# # # comment"[edit]

Status
still bug
Reported by
Wikid77 (talk) 22:27, 26 March 2017 (UTC)
Type of bug
Bad, invalid cite parameter
Actual / expected output
While bot changes Google Books links in "A. C. Benson" (dif276), a commented <!-- --> archive url became "# # # citation bot : comment # # #" or such.
We can't proceed until
Agreement on the best solution
Requested action from maintainer


This is because the search and replace is case sensitive, which is fine an dandy 99.9% of the time. Obviously, 0.1% of the time it fails. AManWithNoPlan (talk) 15:16, 5 April 2017 (UTC)

Added invalid date[edit]

Status
feature request
Reported by
Keith D (talk) 10:58, 27 April 2017 (UTC)
Type of bug
Inconvenience
Actual / expected output
The BOT added an invalid date causing a citation error. It should not add dates of the format yyyy–mm, which are ambiguous, but of mmm yyyy or yyyy–yyyy format.
Link
https://en.wikipedia.org/w/index.php?title=Gunpowder&diff=777432809&oldid=777250884
We can't proceed until
Agreement on the best solution
Requested action from maintainer


Not a bug. The date is grabbed from Google and not massaged at all. The date is not invalid, just a date that a template complains about. The page is better after the bot as this date, because the reference now has a date (and a template warning too). AManWithNoPlan (talk) 15:02, 27 April 2017 (UTC)

Incorrect DOI removal[edit]

Status
new bug
Reported by
TheDragonFire (talk) 06:17, 22 July 2017 (UTC)
Type of bug
Deleterious
Actual / expected output
The bot incorrectly removes a DOI from a citation, and inexplicably renames a blank url parameter to DUPLICATE_url.
Link
https://en.wikipedia.org/w/index.php?title=Referred_itch&diff=791742279&oldid=774741740
We can't proceed until
Agreement on the best solution
Requested action from maintainer


This is the comments bug. The bot uses a greedy search for comments. AManWithNoPlan (talk) 13:13, 22 July 2017 (UTC)

I just proved it by undoing the previous edit, then removing the comments, and finally running the bot again. https://en.wikipedia.org/w/index.php?title=Referred_itch&diff=prev&oldid=791790626 AManWithNoPlan (talk) 14:21, 22 July 2017 (UTC)

lowercasing "the" as the first word in a subtitle?[edit]

Is it correct for this bot to remove capitalization from the word "the" when it's immediately following a colon as the first word in a subtitle? That's a wordy sentence, and might be confusing, so I'll also ask: is it correct for this bot to do this: [4]? — fourthords | =Λ= | 15:11, 12 August 2017 (UTC)

That is a good question. I edited https://en.wikipedia.org/wiki/User:Citation_bot/capitalisation_exclusions to make this Star Trek magazine have a capital The. Generally, a the is not capitalized in the middle of a sentence, but this is a weird case where a colon really is being used more like a period than a colon. AManWithNoPlan (talk) 13:44, 13 August 2017 (UTC)
This is in general true of all words following colons and dashes, not just 'The'. 'A' and "An" are very common, as are many others. The bot should leave the capitalization of all follow-up words alone. Headbomb {t · c · p · b} 14:40, 13 August 2017 (UTC)
The convention I've usually seen is that the word following a colon in a complete English sentence is not capitalized (although I think in earlier styles it might have been) but the word following a colon in the title of a publication is capitalized. For instance the mathematics publication database MathSciNet, which aggressively lowercases even words after the first in titles of books (unlike most other bibliographic sources), nevertheless follows this convention. —David Eppstein (talk) 18:23, 13 August 2017 (UTC)

Flags non-duplicate title as duplicate[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 15:13, 15 August 2017 (UTC)
Type of bug
Deleterious: Human-input data is deleted or articles are otherwise significantly affected. Many bot edits require undoing.
Actual / expected output
[5]
Not that.
Link
https://en.wikipedia.org/w/index.php?title=Alpha_particle&diff=795641460&oldid=795641155
Replication instructions
Run on [6]
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


This is the comment bug. Can someone please grab the source on GitHub and update the development version so we can test it out. AManWithNoPlan (talk) 17:39, 15 August 2017 (UTC)

Authors must be people, not companies[edit]

Status
new bug
Reported by
 Stepho  talk  09:29, 16 August 2017 (UTC)
Type of bug
Inconvenience: Humans must occasionally make immediate edits to clean up after the bot
Actual / expected output
Bot is adding company names in author fields, eg '|author1=Magazines |first1=Hearst'
Link
https://en.wikipedia.org/w/index.php?title=Internal_combustion_engine&type=revision&diff=795735634&oldid=795700442
We can't proceed until
Bot operator's feedback on what is feasible
Requested action from maintainer


Perhaps the bot could look for keywords like 'magazine', 'journal', 'newspaper', etc and common variations (eg upper/lowercase, plurals).  Stepho  talk  09:29, 16 August 2017 (UTC)