User talk:Rjwilmsi/Archives/2012/March

From Wikipedia, the free encyclopedia
Jump to: navigation, search

AWB Persondata interwiki

Someone shoved persondata into a stub I knocked up from Hungarian. Thing is, the only data that got populated is the stuff that appeared in the stub; there's more data in the infobox over on the Hungarian article for that guy. I read your BRfA for persondata so I figured you'd be able to extend/fix the AWB module so that it interwiki populates - preferably in both directions (so, assuming hu: is missing persondata - which it seems to be, populate it in there - with appropriate permissions). Great idea, right? Josh Parris 11:13, 24 February 2012 (UTC)

Interesting idea. I think more value would come from synchronizing/cross-populating the infoboxes, since they hold more information than the persondata. Persondata completion from the infobox would then follow as currently. Rjwilmsi 17:59, 24 February 2012 (UTC)
Sure, but even more value from synchronizing/cross-populating the articles. Just code up a bot to do that, would you?
I figured persondata was a reasonable step up from article titles, as it's fairly constrained data that has some chance of being machine-readable from infoboxes and -translatable across languages. Infoboxes are written in the local language, with wikilinks to local articles, which might not exist in the target wiki... yuk.
You'd need a matrix of translation terms for something like "Rock musician"->"Rockmusiker"->"Rocker"->"ロックミュージシャン"->"El músico de rock" etc A self-taught matrix might be best, but one that receives audited human oversight. You could lean on our interwiki links for translation, but that isn't super-reliable - Rock musician goes to Rock music, and even then the interwiki links are best matches, not translations; as an example, most interwikis for Bassinette go to varieties of infant bed. The translation database ought to be stored on-wiki for human editing and machine sharing; perhaps on commons. Say commons:Persondata/en/Rock musician redirects to commons:Persondata/de/Rockmusiker (because that was the first one populated).
Translating dates will be fun too, because of the permissive format of the date fields. I have no idea how you're going to detect DD-MM-YYYY vs MM-DD-YYYY, but it's only going to be a problem on en.
And you've got the problem of "NAME=Spanish Dude | ALTERNATIVE_NAME = El Dude von Spanyard (Spanish) | SHORT DESCRIPTION = Rock musician" becoming "NAME=El Dude von Spanyard | ALTERNATIVE_NAME = Spanish Dude (English) | SHORT DESCRIPTION = El músico de rock", which doesn't look very database-y to me - more like localised metadata-y.
Surely this little task is hard enough? Josh Parris 21:07, 24 February 2012 (UTC)

See also Wikipedia:Village pump (proposals)#Persondata backlog done by bot Josh Parris 04:23, 25 February 2012 (UTC)

There's been a request at Wikipedia:Village pump (proposals)#Persondata backlog done by bot that you re-run your your BRfA for persondata Josh Parris 03:44, 4 March 2012 (UTC)

Idea for consolidating '08 archives

Not as nice as the archive for '06 and '07 but that looked like it took quite some work. Make the filled-out above list (all 12 mo.) -- easily picked up from here -- the content under the name User talk:Rjwilmsi/Archives/2008, then link to that new page as just "2008"+- (page created already by mistake in this initiative; to be elim'd if unused) at the top of your Archive box; open room for the 2012-months column. Motive: Wanted to know where our good recent xchng maybe would find a link. ;-) A thought. Swliv (talk) 21:46, 27 February 2012 (UTC)

Any interest in this? I'd probably do it if "yes", to see if I can. Cheers. Swliv (talk) 20:27, 2 March 2012 (UTC)

Misuse of {{R from title without diacritics}}

Your bot recently created ʾUriʾel to redirect to ʾÛrîʾēl, which itself is a redirect to Uriel. Another bot fixed the double redirect, leaving the {{R from title without diacritics}}. The difference between "ʾUriʾel" and "Uriel" is not one of diacritics.

A related situation is that of ʾEḏom → ʾĔḏôm → Edom is similar, but it assumes that "ḏ" does not have a diacritic.

The bot should check for double redirects, and not create them, or at least not mistag them. It should also have an updated list of characters with diacritics. Gorobay (talk) 15:59, 29 February 2012 (UTC)

rev 7971 Additional known diacritic added. Rjwilmsi 19:09, 1 March 2012 (UTC)

CiteComplete Request

Can you please run the bot on the Kolkata page to add locations of the newspapers. Please add all the location of as "New Delhi". The same has been requested at the FARC. Please treat this as urgent. Amartyabag TALK2ME 04:52, 2 March 2012 (UTC)

Done Rjwilmsi 18:08, 2 March 2012 (UTC)
Thanks for the help. Amartyabag TALK2ME 01:48, 3 March 2012 (UTC)

This might interest you

If you have time, please check this out: Wikipedia:Village pump (miscellaneous)#DOI cleanup. Headbomb {talk / contribs / physics / books} 20:59, 6 March 2012 (UTC)

Yes, I'll work on this, may not be until the weekend though. Rjwilmsi 22:17, 7 March 2012 (UTC)

Континентаlьная хоккейная lига listed at Redirects for discussion


An editor has asked for a discussion to address the redirect Континентаlьная хоккейная lига. Since you had some involvement with the Континентаlьная хоккейная lига redirect, you might want to participate in the redirect discussion (if you have not already done so). Gorobay (talk) 17:20, 7 March 2012 (UTC)

Nice work on the pmid and doi updates

I see that you are adding new data that is not present but are you able to check that existing data matches the pmid or doi data from the external databases?. -- Alan Liefting (talk - contribs) 23:44, 11 March 2012 (UTC)

In the last database dump I checked for conflicts in volume and issue parameters versus pubmed for the ~22k pubmed records I have downloaded and fixed any genuine errors, there were a few (various due to different numbering/format conventions, these have to be ignored), can't remember exactly but maybe 20. I don't currently have any DOI data directly from crossref. I'm not sure what level of "matches" you are looking to establish, and while I'm happy to be answering your queries I'm not sure where you are trying to go with this? There are > 700,000 records so I assume you are looking for automated matching based on key parameters such as journal, author, volume, issue, pages, year and a report of discrepancies (I myself have no appetite to manually review 700k records). I could do that with the right data, though I don't currently have a way to get a bulk dataset from pubmed or crossref. Perhaps though a formal request from Wikipedia would be granted by them for a one-off data dump of their data, if we provided a list of PMIDs or DOIs.
Largely though I would expect that wiki accuracy versus the published databases will be high as much of the Wikipedia data has come via citation bot, which comes from the published databases. Though I've found and reported a handful of pubmed data errors to pubmed. Rjwilmsi 23:59, 11 March 2012 (UTC)
I am looking into how good the actual references in WP are and looking to see if it can be improved by automated or semi-automated means.
Based on the work that you have done is seems that there is 0.01% of the pmid and doi refs that may needed a tweak for volume and issue. In order to have robust ref the date and author, and maybe other fields should be accurate. Year will be easy to check but author might be a bit variable and throw up a lot of errors. Perhaps a formal request by the Wikimedia Foundation could be made to get the 700k of data. Checking to see if all that data matches what is in WP is not something I would not expect one editor to do. It would be a nice little task force for one of the cleanup WikiProjects.
I would like to see a similar system used to compare the ISBNs in a ref to the actual data that is entered in the ref. -- Alan Liefting (talk - contribs) 00:26, 12 March 2012 (UTC)

Jozef Majoros (footballer born 1970) listed at Redirects for discussion


An editor has asked for a discussion to address the redirect Jozef Majoros (footballer born 1970). Since you had some involvement with the Jozef Majoros (footballer born 1970) redirect, you might want to participate in the redirect discussion (if you have not already done so). Cloudz679 22:10, 12 March 2012 (UTC)

AWB on the Mac

Hi RJ, I run a 21.5-inch iMac 3.60GHz Intel Core i5. Do you know if AWB can be run on it? It seems to hinge on whether .NET framework can work on the Mac. Thanks for your advice. --Ohconfucius ¡digame! 07:01, 13 March 2012 (UTC)

AWB won't run directly. AWB is usable under Wine (from Linux at least), any edits saved would be the same though not all options work perfectly. If you have a spare Windows licence VMWare player/visualization would be the best option. Rjwilmsi 07:48, 13 March 2012 (UTC)
  • Thanks, I'll research that. Player is the free software, right? and I then install Windoze on top... --Ohconfucius ¡digame! 08:07, 13 March 2012 (UTC)

Fixing citation parameters

Hi Rjwilmsi! I see that your bot made some fixes to change {{cite web}} |fotmat= to |format= and |ast= to |last=. I've added these to WP:AWB/RTP so all AWB users can help clean these up. Happy editing! GoingBatty (talk) 23:19, 14 March 2012 (UTC)

Edit the Article

Moved from User talk:Headbomb, see original [1]

Hi Rjwilmsi, You can edit as article to Alejandro Correa Rueda. [2]

Thank you very much.

Best regards,

The Old Wise — Preceding unsigned comment added by Old Wise (talkcontribs) 01:31, 15 March 2012 (UTC)

I moved this guy's comments to your talk page, since he seems to have been looking for you. I'll also note that Wikipedia:Articles for deletion/Alejandro Correa Rueda might be relevant. Headbomb {talk / contribs / physics / books} 14:10, 15 March 2012 (UTC)

Incorrect PubMed citation addition

Hello there! I just wanted to report that twice now the AWB-related edits have added a PMID reference (#309024) to the article on astronomer Dale Frail. Specifically, an ApJ paper from 2000 on Fireball Calorimetry (currently ref 9 on that page) keeps picking up that PubMed link, which points to an article on glaucoma originally published in 1978. Needless to say, there is no connection between them. The first time this happened was at 20:01 on 19 February 2011‎, and again at 21:18 on 19 October 2011‎. I don't know the algorithm that causes AWB to think they are related, but it may need some tweaking. :-) Thank you for all the time and care that you put into improving Wikipedia! (talk) 17:29, 22 March 2012 (UTC)

Thanks for letting me know. The source of the error was on Gamma-ray burst, now fixed. Rjwilmsi 07:53, 23 March 2012 (UTC)
Looking at the Gamma-ray burst article, I see other references with incorrect PMID links. For example, Akerlof 1999 links to a PubMed paper on acidosis in dairy cows. Galama 1998 has a PMID link to a paper on the effects of various drugs on rat organs. Sari 1998 is similar. The incorrect PMID numbers appear to be much lower in sequence (5-6 digits) than the ones that are correct (8 digits). So perhaps low-numbered PMID connections from articles in non-medical fields should be avoided. I don't know how common such links are, but this could affect more than just these few pages. ... Ah. Poking around very briefly using Wikipedia links from the pages under discussion here, I found the page on Cosmic-ray_observatory, one reference in which has an 8-digit PMID that points to a paper on orangutan conservation. It appears the number of digits is irrelevant, and PMID links from other sciences are suspect more generally. Do you have a tool to trace where they appear? (talk) 06:30, 29 March 2012 (UTC)
The pattern on gamma-ray burst is that the PMID is the second part of the DOI, so is wrong. This is the list of that error, which I've now fixed in the articles:
article DOI PMID
Bird 10.1086/282459 282459
Dinosaur 10.1086/282459 282459
Gamma-ray burst 10.1038/27150 27150
Gamma-ray burst 10.1086/309024 309024
Dale Frail 10.1086/309024 309024
Beta Cassiopeiae 10.1086/426704 426704
Template:Cite doi/10.1086.2F426704 10.1086/426704 426704
Gamma-ray burst 10.1086/311269 311269
Gamma-ray burst emission mechanisms 10.1086/311269 311269
Template:Cite doi/10.2307.2F108520 10.2307/108520 108520
Hydrogen 10.1086/317138 317138
Gamma-ray burst 10.1038/18837 18837
Gamma-ray burst emission mechanisms 10.1038/18837 18837
As for the Cosmic-ray_observatory error, the PMID links to a paper starting on the same page in the same issue of Science as the correct paper, the User:Citation bot therefore picked it up as a match. This happens occasionally. I am working towards having an export from pubmed of all PMIDs used on Wikipedia, that will allow me to identify these sorts of errors (e.g. author title/author mismatch in this case). Rjwilmsi 07:10, 29 March 2012 (UTC)