Wikipedia talk:WikiProject Academic Journals/Journals cited by Wikipedia

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Academic Journals (talk)
WP:AJ
Main / Talk
Resources
Main / talk
Writing guide
Main / talk
Assessment
Main / talk
Notability guidelines
Main / talk
Journals cited by Wikipedia
Main / talk / Typos

WikiProject Academic Journals (Rated Project-class)
WikiProject icon This page is within the scope of WikiProject Academic Journals, a collaborative effort to improve the coverage of Academic Journals on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Project  This page does not require a rating on the project's quality scale.
 

Disambigs needed[edit]

Moved from [1] Headbomb {talk / contribs / physics / books} 23:16, 27 February 2017 (UTC)

For Lancet and some other entries. Can this page be edited manually, or would this break the bot updates? --Piotr Konieczny aka Prokonsul Piotrus| reply here 22:58, 19 January 2014 (UTC)

Diacritics[edit]

Moved from [2] Headbomb {talk / contribs / physics / books} 23:17, 27 February 2017 (UTC)

A lot of these are Latin characters, with diacritics. —innotata 22:35, 29 September 2010 (UTC)

interesting list[edit]

Moved from [3] Headbomb {talk / contribs / physics / books} 23:18, 27 February 2017 (UTC)

Very interesting list. Thanks for you work. emijrp (talk) 14:35, 30 November 2011 (UTC)

Next dump numbers will probably change a lot[edit]

If you're wondering why there's a big jump next time for the late April/early May update, that's because all the {{cite doi}} and {{cite pmid}} templates have been deleted. Headbomb {talk / contribs / physics / books} 00:57, 29 April 2016 (UTC)

Color-coding categorization[edit]

@JLaTondre, DGG, Randykitty, Mark viking, Everymorning, and Steve Quinn: It always annoyed me that we can't easily tell if a link is pointed to its intended target or not. So I figure if we color code some things, that would help. For instead, when everything behaves nicely, we would have

Rank Journal1 Type2 Target1 Type2 Citations Articles Citations / article Search
0 ISO 4 ISO ISO 4 ISO 50000 20000 2.500
1 Academic journal J Academic journal J 40000 20000 2.000
2 Magazine Mag Magazine Mag 20000 15000 1.333
3 Newspaper News Newspaper News 15000 10000 1.500
4 Website Web Website Web 10000 5000 2.000
5 Book Book Book Book 5000 2500 2.000
6 Database Data Database Data 5000 2000 2.500
7 Publisher Pub Publisher Pub 5000 1000 5.000
8 Uncategorized ? Uncategorized ? 1000 750 1.333

But if there is a mismatch, we would have two different colors on the same line.

This would let us immediately see if there is an issue with were the links point, or if we forgot to categorize some articles. What's the feeling on this? Headbomb {talk / contribs / physics / books} 23:15, 18 February 2017 (UTC)

Discussion[edit]

JLaTondre (talk · contribs), what's doable here? Could we get a mockup of say Popular1 and A1 to see what this would look like in practice? If you find "journal", "magazine", "website", "publisher" in the categories, can you use that to populate |display-type= and |target-type=. (Also, if you find {{R from ISO 4}}, you can use that to set |display-type=journal.) Headbomb {talk / contribs / physics / books} 23:21, 18 February 2017 (UTC)

  • At the moment, I am not sure what this is ?? about. Is this a link detector? I am guessing it is a good idea, but I am not sure what the intent is. And I am not sure what this is. However, maybe if I see it in action with specific journals or magazines I would understand it. If you want to implement it - feel free to do so. I am sure I will catch on. I notice links to Nature(journal) in the right hand links. And then another journal below that. ---Steve Quinn (talk) 00:51, 19 February 2017 (UTC)

Let's use a real example (from E1, so you'll see better what I mean. Currently Earth (magazine) redirects to American Geosciences Institute. This would be presented as Earth = Magazine, which can be inferred from the title, and (American Geosciences Institute = Publisher, which can be inferred from a category with "societies" in it. Visually, this would look like

Journal1 Type2 Target1 Type2 Citations Articles Citations / article Search
Earth Mag American Geosciences Institute Pub 11 11 1.000

We clearly see there is a mismatch between the two links. One is from a magazine, the other is to a publisher. Consider instead if the redirect Earth (magazine) wasn't created. What we would have is

Journal1 Type2 Target1 Type2 Citations Articles Citations / article Search
Earth ? Earth ? 11 11 1.000

This would tell us neither article are categorized, and would need to be reviewed. Headbomb {talk / contribs / physics / books} 01:13, 19 February 2017 (UTC)

Headbomb This is really good. I think we should use it - make it part of this project. Thanks for doing this. Steve Quinn (talk) 03:33, 19 February 2017 (UTC)
I will look into it. It sounds doable. The biggest issue will properly be recognizing the article types. It may be we start with some easy to recognize cases and then add onto it as we go. -- JLaTondre (talk) 01:31, 20 February 2017 (UTC)
Yes. You might want to externalize a keyword list or something so it's easier to update things without your involvement, but it's probably simplest if we just iterate a few times until we get something 'good enough', and then refine the logic from dump to dump as we find corner cases. Headbomb {talk / contribs / physics / books} 18:48, 20 February 2017 (UTC)

Implementation[edit]

Implemented in the current run that is updating now. However, it looks like you only did the coloring in a sandbox version of the template? If you update the main template, it should show up. Currently, it is using the following logic for detecting types:

  1. First checks if title is "SOMETHING (journal)", "SOMETHING (magazine)", etc. and basis it on that
  2. Second checks for {{R from ISO 4}} and declares it a journal
  3. Third checks for "Category: SOMETHING journals", Category: SOMETHING magazines", etc. and basis it on that

It will first check if it is a journal, then magazine, then newspaper, then website, and then publisher so if it has multiple categories, it will use the one it finds first in that order. Let me know if anything looks weird. -- JLaTondre (talk) 01:42, 17 March 2017 (UTC)

Looks like you did make changes to {{JCW-row}}, but they aren't working. The first two tables above are using {{JCW-row/sandbox}} and they are showing the colors. But the next two (the Earth examples) are using the non-sandbox version and are not showing the colors. -- JLaTondre (talk) 02:02, 17 March 2017 (UTC)

@JLaTondre: Looks great for a first pass. Some comments/refinements. Logic order should be

  1. Foobar = {ISO > Journal > Magazine > Newspaper > Website > Database > Book > Publisher} (i.e. add an ISO and a Book type)
    1. Journal = {Abhandlungen, Abh., Abh, Annals, Ann., Ann, Berichte, Bulletin, Bull., Bull, Cahiers, Comptes Rendus, C. R., C R, C.R., CR, Journal, J., J, Letters, Lett., Lett, Notices, Not., Not, Proceedings, Proc., Proc, Publications of <...>, Publ., Publ, Reviews, Review, Rev., Rev, Transactions, Trans., Trans, Zeitschrift, Z., Z}
    2. Magazine = {Digest, Fanzine, Magazine, Mag., Mag, Newsletter, Newsl., Newsl, Webzine}
    3. Newspaper = {Chronicle, Courier, Daily, Echo, Gazette, Herald, Mail, Newspaper, Post, Standard, Star, Sun, Sunday, Tabloid, Telegraph, Times, Tribune}
    4. Website = {Website, www., .com, .gov, .org}
    5. Book = {Anthology, Book, Dictionary, Encyclopaedia, Encyclopædia, Encyclopedia, Handbook}
    6. Database = {Catalog, Catalogue, Database}
    7. Publisher = {Academy, Agency, Association, Books, Commission, Committee, Company, Co., Co, Corporation, Editions, Éditions, École, GmbH, Group, Inc., Inc, Imprint, Institute,, Ltd., Ltd, Museum, Organisation, Organization, Press, Presses, <...> Publications<ENDMATCH>, Publisher, Publishers, Publishing, School, Society, Sons, University}
    8. All synonyms should be exact matches, but not case sensitive
  2. Look for Category:Redirects from ISO 4, if so, the type is ISO
  3. Look for "\(.* foobar\)" disambiguator in page title
  4. Look for "foobar" in category. Restrict synonyms to
    1. Journal = {Annals, Journals, Proceedings, Transactions}
    2. Magazine = {Digests, Newsletters, Magazines, Fanzines, Webzines}
    3. Newspaper = {Gazettes, Newspapers, Tabloids}
    4. Website = {Websites}
    5. Book = {Anthologies, Books, Dictionaries, Encyclopedias, Handbooks}
    6. Database = {Catalogs, Catalogues, Databases}
    7. Publisher = {Academies, Agencies, Associations, Commissions, Committees, Companies, Corporations, Imprints, Institutes, Museums, Organisations, Organizations, Presses, Publishers, Schools, Societies, Universities}
  5. Look for "foobar" in title

Also, if you could use |d-type= and |t-type= instead of |display-type= and |target-type= it would save some KBs. And instead of "journal", "magazine", etc., just "j", "m", "w", etc. See [4] and [5]. Headbomb {talk / contribs / physics / books} 18:48, 17 March 2017 (UTC)

Updating now with the new logic. -- JLaTondre (talk) 17:56, 19 March 2017 (UTC)
  1. Doing a quick review, it seems I've missed a few keywords. I've put them in bold in the above list.
  2. In Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Popular1, J. Biol. Chem. and Proc. Natl. Acad. Sci. U.S.A. and many others are categorized as a journal. It should be categorized as an ISO redirect. (Sidenote, looking through Category:Redirects from ISO 4 is safer than looking for templates). It's also possible you're cheking for ISO too late in the logic
  3. The bot doesn't try to categorize red links. Not sure if it should since the above logic is kinda designed so that making sense of the title is the last resort. There would be fair number of false categorizations if categories can't be relied on. On the other hand, many/most such categorizations will be valid. I think it'd be worth it on redlinks.
  4. Default" should use "?" (or just nothing) instead of "d". This will free up d for "Database". Headbomb {talk / contribs / physics / books} 20:47, 19 March 2017 (UTC)
New version saving with all changes except these and red link parsing. Take a look at it and check the false positive rate. There are plenty of pages (like anything with 'not' in the title) that would get id'd as journals that aren't, but that might not be too much of an issue since they aren't ones that are probably get used in citations. I'll add the new 'aliases' as well and think about how to do the red link parsing (have to refactor for that). I won't do another update until the April dump comes out. -- JLaTondre (talk) 00:30, 22 March 2017 (UTC)

Will review. But I'm betting there won't be too many negatives. Thought of that and did a "in title" searches, and the occurrences are few for those that would appear in the journal parameter.Headbomb {talk / contribs / physics / books} 00:43, 22 March 2017 (UTC)

JLaTondre (talk · contribs), something weird going on with the magazine logic. Headbomb {talk / contribs / physics / books} 01:19, 22 March 2017 (UTC)
Yeah, found the issue. Databases were also being assigned the wrong value. I've fixed those and added the new 'aliases'. I'll make an updated run tomorrow since the current results aren't correct. -- JLaTondre (talk) 01:44, 22 March 2017 (UTC)

Also

Headbomb {talk / contribs / physics / books} 01:57, 22 March 2017 (UTC)

Yes, all the same issue as above (syntax error in a regex). Fixed. Added 'organization' as well. -- JLaTondre (talk) 02:34, 22 March 2017 (UTC)
Fixed version saving now. Let me know if you see anything else. -- JLaTondre (talk) 02:12, 23 March 2017 (UTC)
JLaTondre (talk · contribs) Looks much better and near flawless!
No need to re-run for those things, but the logic should be tweaked in the next run. Headbomb {talk / contribs / physics / books} 03:45, 23 March 2017 (UTC)
Fixed. Organization was a typo in the regex and Catalog was being set to website vs. database by mistake. -- JLaTondre (talk) 01:25, 24 March 2017 (UTC)
JLaTondre (talk · contribs) Apparently the dumps happen on the 1st and 20th of the month. I guess they've increased their frequency. No need to run on the new code on the 20th's dump since the 1st is just around the corner, but I'll be very eager to see the 'final' code roll out combined with the ISO-tagging I did in the past few days. Headbomb {t · c · p · b} 22:08, 29 March 2017 (UTC)
Dumps typically happen twice a month. Exact dates can vary if they have issues. Results from the 20170401 dump have been posted. -- JLaTondre (talk) 17:54, 5 April 2017 (UTC)

@JLaTondre: [[6]] gets classified as journal rather than ISO. Are you doing a category check for ISO, or trying to do a template parsing check? Headbomb {t · c · p · b} 16:48, 5 April 2017 (UTC)

Template matching since I'm working off the dump. I'll look at whether retrieving the category or expanding the templates would be easier. -- JLaTondre (talk) 17:54, 5 April 2017 (UTC)

Wikimania 2017[edit]

I will be making (assuming my proposal is accepted) a presentation on JCW at Wikimania 2017, in Montreal.

If you are interested in attending, please sign up! Headbomb {t · c · p · b} 12:55, 7 April 2017 (UTC)