Module talk:Citation/CS1

From Wikipedia, the free encyclopedia
Jump to: navigation, search

id = ISBN[edit]

If I'm understanding this correctly.... If the parameter is id=ISBN, the module will not check the ISBN number for errors. Should the 4,964 articles that contain | id = isbn be converted via a bot to | isbn = ? Amount of articles obtained from April's dump. Bgwhite (talk) 07:16, 27 April 2014 (UTC)

If there is an ISBN in |id= it is not completely checked for format, but is linked if it does not fail some course format checking. If it is 13 digits and starts with 978 or 979 it is linked (e.g. ISBN 9781234567890), but is not linked if it does not start with those digits (e.g. ISBN 9801234567890). If it is 10 digits (with X as a possible 10th character) it is linked (e.g. ISBN 123456789X). If it is not 10 or 13 digits it is not linked (e.g. ISBN 12345678901). [NOTE: I have not looked at the code for this which is part of MediaWiki, not the citation templates.]
At a minimum, there should be some additional logic to moving ISBNs out of |id= into |isbn=. In many cases |id= was used because |isdn= is already occupied and would generate an error if it contained more than one ISBN. In addition, if the editor desired to have additional text prior to, or after, the ISBN then it may have been placed in |id= for that reason. The |isbn= parameter accepts nothing other than a strictly formatted ISBN with no other text permitted. If the |isbn= is already occupied, then obviously an additional ISBN should not be moved out of |id= into |isbn=. If there is additional text in |id= then it is a contextual edit where human editorial judgement should be applied and should not be performed by bot.
If the edit is strictly that |isbn= does not exist and an ISBN is in |id= without additional text – other than "ISBN" – then yes it should be moved into |isbn=. The contents of |id= are not included in the COinS data, but |isbn= is – NOTE: This is contrary to the documentation stating that "any of the identifiers" are included in the COinS data. However, |isbn= is included in the COinS without any format corrections, which, I assume, is why it has been programmed to generate an error if the value is not strictly compliant as an ISBN (i.e. no other characters are tolerated).
In my opinion, it would be better for us to somewhat relax the formatting required in the |isbn= parameter. We could easily strip out all non-numeric characters prior to performing the ISBN format/check-digit verifications and passing that stripped version in the COinS. This would result in fewer errors, both for our editors and in the COinS data at the cost of a single regular expression substitution. In effect we would be permitting additional non-numeric text in the |isbn= value. If desired, the regular expression could also strip a preceding "1[03]:" as that sequence is somewhat commonly used by editors, for some reason, to indicate that it is a 10, or 13 digit ISBN. — Makyen (talk) 08:42, 27 April 2014 (UTC)
Why do we need additional text? Do you have an example where this is needed? And multiple ISBNs or other identifiers are always suspect. I have only seen multiple ISBNs where someone is trying to identify multiple versions of a source, not the particular source they are using.
It is not a question about when I think additional text is needed. My personal opinion is that it is a very rare occasion when it is actually needed. The one occurrence which I recall was on an author's Wikipedia page. The {{Cite book}} templates were used to format a list of the author's works. As part of the list, the ISBNs were supplied for all of the different versions of each book. A brief piece of text was supplied inline to describe the version of the book for each ISBN. I'm not sure I would make the same editorial choice, but I respect the fact that they had made that choice on that page.
The additional text issue is a question of when a significant number of editors consider it appropriate to include such text and how we should handle the fact that it happens a significant amount of the time. Our checking for strict formatting on the ISBN appears to be due to using it in COinS, not just based on verifying that the provided ISBN text would enable a human to find the book, or that linking the ISBN to Special:BookSources will function. Special:BookSources appears to strip all non-numeric characters from what is passed to it. Humans can handle a much wider variety than the strict requirements we are currently applying to this field. We are imposing much stricter requirements that do not need to exist in order to accomplish the primary task of enabling someone to find the reference. The strict format requirement makes the template less user friendly when being a bit more user friendly (tolerant of a somewhat larger range of formats) costs very little and actually improves the quality of the data we are passing via COinS (i.e. we strip any extraneous text instead of only flagging an error).
In going through Category:Pages with ISBN errors the most common additional text that actually has some meaning is to append a short descriptor about which version of the book the ISBN is for. For example: "{paperback}", "(pbk)", "(hardback)", "(hdb)", etc. Are these strictly necessary for identifying the book – assuming the ISBN is actually correct: no. As a human looking to acquire the exact book is it helpful information to know: yes.
There are also a significant number of citations where effectively useless information is provided. For example prefixing the value with "10:", "13:" "ISBN", etc.
I question why we consider the additional text as "errors" when they are in fact not an actual error, merely a deviation from strict formatting of this specific parameter. This is when the strict formatting is not needed for it to be functional in the way that it the information is primarily used (link to Special:BookSources) and most deviations from the strict formatting are trivially handled in the module to provide good data, in most cases, via COinS. The processing necessary to provide good data via COinS is a regular expression replacement. This is something we at least come close to doing already. Even for a properly formatted ISBN we have to strip out the "-" or " " characters in order to calculate the checksum.
To cover a specific issue: I am not suggesting that we change what we display in the citation (except no error when it is now not an error). We currently display all text supplied in the |isbn= value. We should continue to do so.
As to multiple ISBNs in the same citation. Yes, of course, it is suspect. However, please note that what I said about multiple ISBNs was that the proposed move-the-ISBN-from-id-to-isbn bot should not create an error where none currently exists by either creating a duplicate |isbn= or by moving a second ISBN into the |isbn= where it will be an error when an editor has already placed the second ISBN in |id= where it does not create an error. I made no comment about the editorial choice to have multiple ISBNs in the citation, only that the bot should be programed to not create errors in the citation when it comes across some situations that are known to exist. — Makyen (talk) 13:57, 27 April 2014 (UTC)
|type= is the proper parameter for your examples "{paperback}", "(pbk)", "(hardback)", "(hdb)" – without the brackets.
Trappist the monk (talk) 16:07, 27 April 2014 (UTC)
@Trappist the monk: I both agree and disagree with |type= being most appropriate. When making these changes I have no history on the page, and no knowledge of any possible agreement about format. In my opinion, changes to correct citation errors should remain as close to the original editors intent as possible. Thus, for many cases I feel that it is more important to retain the intent of the original editor rather than use the "correct" parameter |type=.
Here is an example which I encountered today:
As originally in the page:
J. W. Negele and E. W. Vogt, ed. (2003). Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. doi:10.1007/b100519. ISBN 978-0-306-47708-9 (Print) 978-0-306-47916-8 (Online) Check |isbn= value (help). 
Using |type= and |id= (location of "Print" disassociates it from the ISBN):
J. W. Negele and E. W. Vogt, ed. (2003). Fifty Years of the Shell Model — The Quest for the Effective Interaction (Print). Advances in Nuclear Physics, Volume 27. Springer-Verlag. doi:10.1007/b100519. ISBN 978-0-306-47708-9. ISBN 978-0-306-47916-8 (Online). 
Using |id=:
J. W. Negele and E. W. Vogt, ed. (2003). Fifty Years of the Shell Model — The Quest for the Effective Interaction. Advances in Nuclear Physics, Volume 27. Springer-Verlag. doi:10.1007/b100519. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online). 
In my opinion, the version which does not use |type= is closer to what the original editor intended.
Note that this citation has other problems and would likely be better as (retaining the 2 ISBN numbers):
Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. Advances in Nuclear Physics, Volume 27. Advances in the Physics of Particles and Nuclei (APPN) 27. Springer-Verlag. pp. 1–275. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. (Print) ISBN 978-0-306-47916-8 (Online) ISSN 0065-2970. 
— Makyen (talk) 23:58, 27 April 2014 (UTC)
Yeah, as you show it, |type= doesn't work so well in your example, not because |type= is wrong but because the original editor is wrong. The CS1 templates are designed to provide information about a single source. Here, the editor is trying to cite two versions of the same source in a single template. We should be glad that he didn't want to include the softcover version as well (ISBN 978-1-4757-8801-3). Perhaps the better solution to the multiple isbn problem is to choose one to use in the template and include the other(s) parenthetically outside the template. This at least avoids the error, includes an isbn in the COinS metadata, and still keeps the rest available:
Talmi, Igal (2003). "Fifty Years of the Shell Model — The Quest for the Effective Interaction". In Negele, J. W.; Vogt, E. W. Advances in Nuclear Physics (hardback). Advances in the Physics of Particles and Nuclei (APPN) 27. Springer-Verlag. doi:10.1007/0-306-47916-8_1. ISBN 978-0-306-47708-9. ISSN 0065-2970.  (alternate: ISBN 978-0-306-47916-8 (Online); 978-1-4757-8801-3 (softcover))
I took out |url=, |chapter-url=, |pages=, and removed the external link from |series=. |doi= gets the reader to the same place as |chapter-url= where all you get is a sample of the table of contents and part of the introduction teaser as part of the publisher's effort to sell you a copy of the book; |url= and the external link in |series= is more selling. There is no point in listing a chapter and all of the pages that make up the chapter; that does nothing to help a reader find the cited information.
Trappist the monk (talk) 10:50, 28 April 2014 (UTC)

id = ISBN should not be changed wholesale to ISBN =, for the reasons noted above. I think it would be reasonable for an editor using AWB to convert instances of id = ISBN that contain plain ISBNs with no extraneous text, in citations where an ISBN is not present.

Some data: I have fixed about 3,000 of the 8,000 articles in Category:Pages with ISBN errors using an AutoEd script in the past couple of months. I have about 2,500 more articles to examine. The script has been able to fix about 60% of the articles I have examined. The most common fixable error, by far, is two ISBNs separated by a comma. These two ISBNs are usually the 10-digit ISBN followed by the 13-digit ISBN.

As for extra text, the examples given above are often present. Sometimes a "printing" or "edition" is present, though it is almost always redundant with |year=. Sometimes multiple volumes, each with its own ISBN, are specified; I don't touch those.

When I am done going through the category, I expect there to be about 2,500 articles left. The large majority of those errors will be legitimate errors: ISBNs with too few or too many numbers. There will be somewhere under 1,000 "low-hanging fruit" still left, primarily ASINs, multiple ISBNs that were too strange or ambiguous for my scripting skills to handle, ISSNs, publisher names, and other easy fixes. After those are fixed, I expect we'll have under 2,000 actual ISBN problems to track down.

Anyone who would like to contribute to clearing out this category is welcome to do so. I recommend starting at the end of the alphabet, since the remaining articles that my script hasn't touched are in the A–N portion of the alphabet (I've been working my way from Z to A). – Jonesey95 (talk) 17:39, 27 April 2014 (UTC)

@Jonesey95: I have been working on them from "A" forward. I was splitting multiple ISBNs in |isbn= into |isbn= and |id= until Redrose64 commented that a large number of them were just both the 10 and 13 digit ISBN for the same book and expressed a belief that the 10 digit one should be removed. I don't agree that there is consensus for us to wholesale override the choice of editors to put both a 10 and 13 digit ISBN into the citation. I fully agree that it is not needed, and would not do so myself. I just don't think that there is a wide enough consensus for us to remove them from thousands of articles. I have not been splitting them wholesale since that point. My intent was to go back through once it was clearer as to how to handle them. I also have not translated the code I wrote for a different purpose which decodes/formats/checks ISBNs from JavaScript to what is needed for AWB (which is the tool I use). Something which actually compares the two and verifies that they are 10/13 duplicates would be needed.
Looking at your script: Your script appears to delete the first ISBN unless it starts with 97[89] without any checks to see that this occurrence is actually a 10/13 duplicate. I consider this to be inappropriate. You may be deleting a non-duplicate. In addition, even in the case where it is a 10/13 duplicate, the editor has made the choice to include both. While I don't agree with that choice, I have not seen something that indicates a wide consensus for removing 10/13 duplicates from thousands of articles.
I disagree with your choice to comment out any ISBN starting with 977. I have seen a good number of ISBNs which have had "97[89]" mistyped as "977". In these cases, changing the 977 to 97[89] was sufficient for the ISBN to be valid and find the correct book.
I am not familiar with scripts for AutoEd. However, the replacements you are performing appear to be performed on the complete text of the article, not limited to citations. For the |isbn= parameter this might be sufficiently specific. On the other hand it might not. You might want to consider adding/changing your regular expressions to more specifically limit them to only being within citation templates. I use the following (or a variation upon):
It also prevents matches with any parameters within one level of sub-template within the citation template. It could be more specific and prevent low probability matches within wiki-links (within citation templates), but a wiki-link with the displayed portion being the format of a parameter, "|\s*isbn\s*=\s*", is a low probability and these are not intended for unattended operation. Note that if there is more than one |isbn= in the citation this will match the one furthest from the {{\s*[Cc]it[ae].
As to ASINs: you change any that are explicitly called out as ASINs. I would suggest adding additional cases to that. My experience so far is that a sequence matching B0[0-9A-Za-z]{8} can safely be considered an ASIN even when not explicitly stated as an "ASIN". However, I have been actually clicking on the links created to verify the fact that is is an ASIN and is valid. I have not found a formal specification for ASIN numbers, but aside from those which are also ISBNs, that format has fit the ones I have seen. — Makyen (talk) 23:58, 27 April 2014 (UTC)
Looks like I spoke a bit too soon about using B0[0-9A-Za-z]{8} as indicating an ASIN. I just encountered 4 on a page. Three of them were invalid as ASINs. Although, I have not previously encountered ones which turned up invalid when changed to |asin= based on that criteria.— Makyen (talk) 00:07, 28 April 2014 (UTC)
Thanks for the tips. I will see if I can incorporate some of them into my editing.
My answer to most of your concerns is that I visually inspect each article's ISBN errors before running my script, and then I visually inspect each of the script's proposed edits before saving. There are plenty of articles that I skip because I can see in advance or after running the script (but before saving) that the script will produce undesirable results.
I believe that I am commenting out only 13-digit "977" numbers, which are typically UPC bar codes; I don't see many of these. I look at the citation to confirm that it does not appear to be a book before doing so, but I comment it out instead of deleting it because I can't be sure. There is a particular editor who has inserted many "977" numbers, allegedly for Billboard Brasil, as ISSNs and ISBNs. I did a ton of research to try to find a valid ISSN for these, and failed, so I resorted to commenting them out.
ASINs: There are a couple hundred apparent ASINs in the category. I didn't feel comfortable changing them without checking each one manually, so I have saved them for a second pass.
As for removing a 10-digit ISBN when a 13-digit ISBN is also present, my understanding is that they contain identical information and lead the reader to the same book (at, for example) when clicked. The CS1 error help text explicitly says to "Use the 13-digit ISBN when it is available" and that "Only one ISBN is allowed in this field" because it breaks the metadata and breaks the link to Special:BookSources. – Jonesey95 (talk) 01:08, 28 April 2014 (UTC)
Including multiple ISBNs, such as for print and online is an issue, since we can not definitively determine which version was consulted. Fixing these has the same problem, where we cannot determine the definitive source. --  Gadget850 talk 01:11, 28 April 2014 (UTC)
Multiple ISBNs may be useful outside of references, in a list of works. For example, the subject of an article might be the editor of a multi-volume encyclopedia, for example, where each volume has its own ISBN. In that case, putting all of the ISBNs into |isbn= is not appropriate, but neither is removing all but one ISBN. Using |id= or putting the ISBNs outside of the citation template might work; I haven't given it enough thought yet, since I've been working on the easy fixes. – Jonesey95 (talk) 01:17, 28 April 2014 (UTC)
wp:SAYWHEREYOUGOTIT pertains. If we can't tell which was seen due to multiple ISBNs, we imply they are equivalent (down to pagination). In that case it might be cleaner to cite OCLC 70752232 or OL9534802M.LeadSongDog come howl! 13:49, 28 April 2014 (UTC)
13-digit numbers beginning 977 are the EAN-13 representation of an ISSN, but they are not ISSNs: a true ISSN has eight digits. It is not always easy to convert an EAN-13 to an ISSN: for example, The Railway Magazine is ISSN 0033-8923 and the barcode is 977-0033-89229-3 - clearly seven digits correspond, but I don't know about the rest. --Redrose64 (talk) 17:48, 28 April 2014 (UTC)
If a multi-volume work has an ISBN for each volume, then I recommend listing each volume individually with the appropriate ISBN. Otherwise, there is no connection between the volume and the ISBN. --  Gadget850 talk 13:03, 29 April 2014 (UTC)

Multiple ISBNs[edit]

Would it be feasible to have multiple instances of {{{isbn}}}, each associated with a {{{type}}}? For example, the above example could be converted to {{cite book |chapter=Fifty Years of the Shell Model — The Quest for the Effective Interaction |date=2003 |publisher=[[Springer-Verlag]] |doi=10.1007/0-306-47916-8_1 |title=Advances in Nuclear Physics |volume=27 |first=Igal |last=Talmi |editor1-first=J. W. |editor1-last=Negele |editor2-first=E. W. |editor2-last=Vogt |isbn1 = 978-0-306-47708-9 |type1=hardback |issn=0065-2970 |series = Advances in the Physics of Particles and Nuclei (APPN)|isbn2 = 978-0-306-47916-8 | type2 = Online | isbn3 = 978-1-4757-8801-3 | type3 = softcover}} We would default to {{{isbn1}}} or simply {{{isbn}}} for generating COinS metadata, just like at present. HTH HAND —Phil | Talk 17:40, 15 May 2014 (UTC)

No. Where would it stop? Some books have many more than one ISBN - paperback/hardback; audio; USA/UK/Australia/etc. publisher; separate volumes or all-in-one; special coffee-table binding. How many do you need? The answer to that is: give the ISBN of the edition that you actually consulted, and no other. --Redrose64 (talk) 17:52, 15 May 2014 (UTC)
There should only be one - the one the page numbers were taken from. Keith D (talk) 18:40, 15 May 2014 (UTC)
We should not be encouraging storing a significant list of different ISBN numbers. The one which should be selected is the one, without modification, which is printed in the book actually being referenced. If there is more than one printed, use the one that matches the version of the book in-hand. If there is both a 10-digit and a 13-digit version printed in the book, the 13 digit version is preferred. Do not convert from a 10-digit version to a 13-digit version by just adding the 978-; it will be wrong. Do not convert a 13-digit version to a 10-digit version by removing the 978-; it will also be wrong. Use the version as printed in the book.
There are ways to have more than one ISBN if the |id= is used, but that should be an exception, not a rule. If we were going to start listing all of the different identifiers for every edition/version of a book, as Redrose64 said "where would it stop?" As an example: a reference on which I was attempting to fix the ISBN earlier today was citing Magic and Mystery in Tibet. Should we be listing identifiers for all of the 60 versions listed in WorldCat?
If the citing editor has actually checked multiple versions to find that the page numbers and text are exactly the same, then it is reasonable for them to list more than one identifier. The |id= parameter can be used for this purpose and as long as the text "ISBN" precedes a valid format ISBN it will be linked to Special:BookSources by the MediaWiki software. (see Help:Magic links)
  • On the other hand, We should not generate badly formed COinS data if there are extraneous non-numeric characters in the |isbn= parameter. Removing everything other than digits is trivial.
I also believe that we should not generate an error if there is extraneous non-numeric text in the ISBN parameter. All non-numeric text can be removed prior to processing with a single regular expression substitution. We are already performing one regular expression substitution to remove the "-" marks. Given the ease with which all extraneous non-numeric text can be removed – particularly given we are already removing some such text (hyphens) – it feels like we are going out of our way to make the requirements for this parameter more stringent than is needed in order to meet the goals of an accurate link to Special:BookSources and valid COinS data. In fact, we appear to choose to provide bad COinS data when providing good COinS data in a larger percentage of cases is trivial. Just removing such extraneous text prior to checksum verification and forwarding to COinS is slightly easier, from a processing point of view, than what is currently done and results in both that parameter being much more user friendly and our providing good COinS data in a higher percentage of citations. — Makyen (talk) 02:35, 16 May 2014 (UTC)


|website= is listed on the Whitelist; however, if used in conjunction with |archiveurl=, it generates an error message:

  • {{cite book |archiveurl=// |archivedate=May 15, 2014 |deadurl=no |author=Johnson, Malcom |title=Sample Title |website=}}
Johnson, Malcom. "Sample Title". Archived from the original on May 15, 2014 |archiveurl= requires |url= (help). 

If |url= is populated, the error is resolved, but a bare url displays in the citation. (This is also what happens if |archiveurl= isn't populated.)

  • {{cite book |url= |archiveurl=// |archivedate=May 15, 2014 |deadurl=no |author=Johnson, Malcom |title=Sample Title |website=}}
Johnson, Malcom. "Sample Title". Archived from the original on May 15, 2014. 

Is this the desired behavior of this parameter, or is it a glitch? I would think that |website= is an alias of |url=; the AWB renaming script currently replaces it with |url=. Should it continue to do so? Or is this a glitch that will be fixed? I'm about to post a lengthy list of valid alias parameters that the script is currently replacing on that talk page; if the script shouldn't continue to replace |website= with |url=, please be sure to comment there. Thanks!—D'Ranged 1 VTalk 00:14, 28 May 2014 (UTC)

|website= is an alias of |work=, for the name of a website, not the address. Imzadi 1979  00:17, 28 May 2014 (UTC)
Imzadi1979 Thank you; I was mistaken, the script is currently replacing |website= with |work=, which is unnecessary. Sorry I was confused; I've straightened it out over there. Thanks again!—D'Ranged 1 VTalk

Valid parameters missing from Whitelist[edit]

These parameters are not on the Whitelist, but are used successfully in templates.

  • |eprint= in {{cite arXiv}}; a valid alias for |arxiv=, which is required.
  • |class= in {{cite arXiv}}; optional
  • |pmc-embargo-date= in {{arxiv}} coding; however, it doesn't appear in the documentation. Possibly an alias for |embargo= on the Whitelist.

I thought this was supposed to be a current, complete listing of all approved parameters for the templates. If it isn't, one is needed. I've made changes to the list of parameters for Citation bot based on this list in an attempt to avoid the bot making errors; I now have to undo some of those changes.—D'Ranged 1 VTalk 16:58, 30 May 2014 (UTC)

I think you meant {{Cite arXiv}}, which does not yet use the CS1 Lua module to render its citation. The Whitelist is only for cite templates that use the module, and |eprint= is not used in any of those cite templates.
It's confusing, but the Whitelist is correct without those parameters. If {{Cite arXiv}} is migrated to use the module, |eprint= may need to be added to the Whitelist. There is a list of cite templates that use the module at Help:Citation Style 1; they are highlighted in light green in the left column. – Jonesey95 (talk) 19:25, 30 May 2014 (UTC)
Yes, I meant {{cite arXiv}}; I've changed my original post to reflect that; thank you. I'm still confused, however. Help:Citation Style 1#Specific source states: "There are a number of templates that are CS1 compliant but are tied to a specific source; these are listed in Category:Citation Style 1 specific-source templates." (emphasis mine) It then goes on to specifically list {{cite arXiv}}; however, that template is not in the stated category, but is in the "core" category, Category:Citation Style 1 templates. Either way, if a template is considered to be "CS1 compliant", I would expect its parameters to be part of the Whitelist. Additionally, Template:Citation Style documentation/cs1, which is transcluded on {{cite arXiv}} and other modules that are not highlighted in the list of modules, doesn't distinguish between those using Module:Citation/CS1 and those using {{citation/core}}. I thought since they were listed, they were using CS1. So, questions: Should {{cite arXiv}}'s category be changed? Should the language "CS1 compliant" be modified on the Help page? Should the templates listed at Template:Citation Style documentation/cs1 indicate their source? Sorry to be a bother; thanks for your patience—I really appreciate the help.—D'Ranged 1 VTalk 20:31, 30 May 2014 (UTC)
{{Cite arXiv}} is essentially a special case of {{cite journal}}, where some of the parameters (like |journal= |work= and |publisher=) put the page into an error category, and a few extra parameters are recognised. These, |eprint= (and its alias |arxiv=), |version= and |class= are used to construct special links. To cope with these variations, it still uses the older {{Citation/core}} method instead of Module:Citation/CS1. --Redrose64 (talk) 21:39, 30 May 2014 (UTC)

Author check[edit]

Could there be a check on the author field to detect when it contains date type information, such as |author=Published on Mon May 19 14:30:16 BST 2008, that occurs in numerous articles. Probably need to add a tracking category when this occurs so that they can be fixed by removal or transferring information to the |date= field. Regards. Keith D (talk) 00:46, 20 June 2014 (UTC)

Interesting idea. Do you have ideas for patterns that would detect erroneous author values while preventing false positives? It seems possible that a valid author value might contain a date, like "May 2014 Conference Organizing Committee" or something like that.
The root of this problem, in many cases, is lazy use of Reflinks. Reflinks could be programmed to be more clever about sites that put bad data in author fields. I don't know if Dispenser is interested in fixing this problem with Reflinks. People who fix citations could provide a list of the most common web sites that have this bad data, like and some web sites in India. – Jonesey95 (talk) 01:50, 20 June 2014 (UTC)
Probably need to start with picking up some and then expanding as other ones are found. I was thinking of something like " BST nnnn", " EST nnnn", " GMT nnnn" as a starting point. Keith D (talk) 10:43, 20 June 2014 (UTC)
I suggest also looking for the name of the publication at the end of the |title= parameter - usually separated by an HTML entity for a hyphen, dash or suchlike. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 20 June 2014 (UTC)
This looks like a great task for someone with AWB skills who can search the whole database of articles periodically for questionable patterns. A bot could even be set up to create a page that updated a page periodically. If we find certain patterns that never produce false positives, those patterns could be added to the CS1 module and used to create an error category. – Jonesey95 (talk) 16:02, 20 June 2014 (UTC)
@Keith D, Jonesey95: See my BattyBot 24 task for author fixes.
@Pigsonthewing: - User:Ohconfucius/script/Sources removes some publications from the end of the |title= parameter. GoingBatty (talk) 16:47, 20 June 2014 (UTC)
All - it appears that Reflinks might go away at the end of the month - see Wikipedia:Village pump (technical)/Archive 127#Migrating Reflinks, Dab solver, and User:Dispenser's other tools to Tool Labs. GoingBatty (talk) 16:47, 20 June 2014 (UTC)
@GoingBatty: thanks - looking at status page it looks as though task 24 has not been run - could it be run? Keith D (talk) 18:09, 20 June 2014 (UTC)
@Keith D: I added the last run column this month, because I realized that it's been a while since I've run some of the bot tasks. I last ran that task in March, so I'll run it again soon. Thanks! GoingBatty (talk) 22:42, 20 June 2014 (UTC)
@Keith D: I would like to run task 24 to fix the authors with task 31 in early July once the Toolserver is taken down - see User:Dispenser/Toolserver migration. If you can think of any patterns for author fixes you'd like me to include, please let me know. Thanks! GoingBatty (talk) 23:11, 22 June 2014 (UTC)
You could try the patterns " BST nnnn", " EST nnnn", " GMT nnnn" that I indicated above, but you would need to check that the date info was in the |date= field before removing. Keith D (talk) 23:18, 22 June 2014 (UTC)
I specifically mentioned in my RFBA that I wouldn't be removing dates from the author field, since it's beyond my bot creation ability to ensure that the same date is also in the |date= field. GoingBatty (talk) 02:22, 23 June 2014 (UTC)
I did start the thread by suggesting a tracking category for the field so that they could be tackled manually if a BOT cannot do this. Keith D (talk) 12:46, 23 June 2014 (UTC)
@Keith D: My bot completed its run. Once Reflinks gets restabilized, I would like to ask Dispenser if there is a possibility to update Reflinks so incorrect author parameters don't get added in the first place. GoingBatty (talk) 04:43, 10 July 2014 (UTC)
Thanks for the BOT run. As reflinks looks as though it is staying then it needs to be updated in several ways not just the author parameter. Options for date format would also be useful so that we do not have to run round after users setting the dates to the appropriate format. Using a publisher name rather than a web site would be another. Keith D (talk) 12:30, 10 July 2014 (UTC)


doix is used in {{Cite doi/preload}} to prefill a version of the DOI that the citation bot fixes up; it then changes doix to doi. doix was added to the whitelist but subsequently removed in an update from the sandbox. I think that was unintentional. Would an administrator please add doix back into Module:Citation/CS1/Whitelist, maybe with a comment to keep it from disappearing again? Thank you.

 – Minh Nguyễn (talk, contribs) 21:55, 6 July 2014 (UTC)

Not clear to me why |doix= should be included in Module:Citation/CS1/Whitelist. I think that including parameters that have nothing to do with CS1 blurs a very distinct line that should not be blurred. Without some discussion that shows that |doix= must be included, I'm not ready to accommodate this request.
Trappist the monk (talk) 22:56, 6 July 2014 (UTC)
|doix= is an unsupported parameter used temporarily by Citation Bot, I believe. The bot removes it during creation of cite doi templates. The bot is currently blocked, which is why you are seeing |doix=. It should never be seen by a human editor under normal circumstances. – Jonesey95 (talk) 23:12, 6 July 2014 (UTC)
Thanks for your feedback. Another user asked me why their Cite doi subpages were showing a CS1 error. In the meantime, they had "corrected" the error by changing doix to doi, but without decoding the .2F escape, the link was broken. I went ahead and reimplemented the conversion from doix to doi in Module:Cite doi. Let me know if you see any bugs. – Minh Nguyễn (talk, contribs) 23:17, 6 July 2014 (UTC)

non-italic titles[edit]

Books published in Chinese have titles in Chinese characters, romanized titles in pinyin, and translated titles in English, e.g.

  • Wang, Li (1985). Hànyǔ Yǔyīn Shǐ 汉语语音史 [History of Chinese Phonetics] (in Chinese). Beijing: China Social Sciences Press. ISBN 978-7-100-05390-7. 

While the romanized title should be italicized, it is recommended at WP:MOS-ZH that characters not be italicized. However {{noitalic}} is not allowed within citation templates. A common expedient is an extra set of italic quotes, e.g. Hànyǔ Yǔyīn Shǐ ''汉语语音史'' in the above, but this seems brittle. So could these templates have an additional parameter, say |noitalic_title= or something, for a title in a non-roman script that should not be italicized? This would be in addition to |title=, which could be used for the romanized form (which would be italicized), and |trans_title=, for the English translation. Kanguole 10:57, 10 July 2014 (UTC)