Jump to content

Wikipedia:WikiProject Chemistry/IRC discussions/10 February 2009

From Wikipedia, the free encyclopedia

[2009-02-10 16:57:19] -->| walkerma (n=chatzill@admin-151-108.potsdam.edu) has joined #wikichem [2009-02-10 16:59:43] <ali_as> Hi Martin. [2009-02-10 17:00:16] <CheMoBot> user:90.195.95.122 has edited chembox containing page Ammonia (no verified revid available) - http://en.wikipedia.org/w/index.php?diff=269787440&oldid=269636728 (+18) - Summary: ' /* Refrigeration - R717 */ ' [2009-02-10 17:00:46] <walkerma> Hi! Any news? [2009-02-10 17:01:17] <CheMoBot> LOG saved to Wikipedia:WikiProject_Chemicals/Log/2009-02-10 for

WikiProject Chemistry/IRC discussions/10 February 2009
Except where otherwise noted, data are given for materials in their standard state (at 25 °C [77 °F], 100 kPa).

.

[2009-02-10 17:01:19] <CheMoBot> user:Physchim62 has edited chembox containing page Lead_selenide (no verified revid available) - http://en.wikipedia.org/w/index.php?diff=269787622&oldid=269763055 (+67) - Summary: ' safety' [2009-02-10 17:01:32] <ali_as> Not from me, we've had an unusual amount of snow so I've been occupied, aside from the dieldrin/endrin stuff I havn't done anything in a week. [2009-02-10 17:01:52] <Physchim62> only six or seven inorganics left to go! [2009-02-10 17:01:56] <CheMoBot> user:120.89.113.251 has edited chembox containing page PTC124 (no verified revid available) - http://en.wikipedia.org/w/index.php?diff=269787759&oldid=269786880 (+12) - Summary: ' /* See also */ ' [2009-02-10 17:02:03] <walkerma> (I have a student who has just arrived, but feel free to type) [2009-02-10 17:02:04] <Physchim62> chemobot silent [2009-02-10 17:02:05] <CheMoBot> CheMoBot is silent (use 'CheMoBot report' to make me report again). [2009-02-10 17:02:12] <walkerma> Wow, PC, that's great! [2009-02-10 17:02:40] <Physchim62> I'm currently on Tin selenide [2009-02-10 17:03:42] <Physchim62> once that's finished, I'll have a closer look at the problems that are being listed on the project page [2009-02-10 17:04:46] <Physchim62> Beetstra, how is the 'bot these days [2009-02-10 17:04:48] <Physchim62> ? [2009-02-10 17:06:24] <walkerma> I have done up to no. 750, and I've also done 900-1000 to keep Ambix busy [2009-02-10 17:08:27] <ali_as> Ok, just checked the IRC page. Can I raise the question of structures on pages? [2009-02-10 17:08:37] <walkerma> Sure [2009-02-10 17:08:41] <Physchim62> of course! [2009-02-10 17:09:03] <walkerma> First, ali_as, can you confirm you got my email with 900-1000? [2009-02-10 17:09:14] <ali_as> The Dieldrin/Endrin brings up some issues, in that they use a valid, but rather different method of showing the stereochemistry. [2009-02-10 17:09:18] <ali_as> I did Martin. [2009-02-10 17:09:22] <Physchim62> walkerma, don't forget to post your verification problems: I can't believe there are no problems at all with 1–100 [2009-02-10 17:10:07] <walkerma> PC:Will do, when I get back to doing verification of articles [2009-02-10 17:10:13] <ali_as> I'll start working on 900+ next, I have a few things to finish up with my last batch, updating the index and fixing one or two things. [2009-02-10 17:12:33] <ali_as> I'm thinking about problems with the structures and what information can be included and what forms should be used. Is there a style guide for structures and should/could there be a symbol to inidicate something has been verified - in the image itself. [2009-02-10 17:13:01] <Physchim62> Yes, the problem as I see it is that we cannot be sure that a reader will get the correct stereochemistry from http://commons.wikimedia.org/wiki/File:Endrin.png [2009-02-10 17:13:30] <Physchim62> There is now a tag on Commons which you can use to show that an image has been verified [2009-02-10 17:14:21] <Physchim62> http://commons.wikimedia.org/wiki/Template:Chemical_structure_verified [2009-02-10 17:14:58] <ali_as> I'm thinking of what information would be helpful in a structure image. Numbering, chiral center labels, verification status. Put them all in, at least to a small image and it would be a mess. [2009-02-10 17:15:07] <Physchim62> for the moment, I am suggesting that we only use it for structures which have been specifically drawn for the verification project, or for structures that were particularly difficult to verify [2009-02-10 17:15:32] <Beetstra> wow .. people are talking here .. :-) [2009-02-10 17:15:54] <Physchim62> http://commons.wikimedia.org/wiki/Commons:WikiProject_Chemistry/Structure_validation is the commons page for structure verification [2009-02-10 17:16:12] <ali_as> I've had a read of that I think. [2009-02-10 17:16:18] <Beetstra> The bot is doing fine, I think [2009-02-10 17:16:28] <Physchim62> we don't usually put numbering in the image for the chembox because, as you say, it quickly gets cluttered [2009-02-10 17:16:44] <Beetstra> We are now monitoring pagemoves, and the index is/should be adapted automatically if the page is in the index [2009-02-10 17:17:02] <Beetstra> and we can add/move index also from IRC now [2009-02-10 17:17:02] <Physchim62> but some compounds have a separate image (shown later in the article) which gives the numbering scheme [2009-02-10 17:17:24] <ali_as> Just as a long term goal, I'm wondering if chemboxes should include a bigger clickable image with extra information in, like numbering. [2009-02-10 17:18:03] <Beetstra> ali_as, I would put that on a /Data page [2009-02-10 17:18:18] <Physchim62> a clickable image would need changes to the MediaWiki software, so it's not likely to happen this side of the next Ice Age in Hell [2009-02-10 17:18:33] <ali_as> Fair enough. [2009-02-10 17:18:50] <walkerma> Physchim62: Don't forget one type of clickable image - image maps! [2009-02-10 17:18:56] <walkerma> They work great! [2009-02-10 17:19:06] <Physchim62> on the other hand, we could always have a link to another image, like we use now for IUPAC names [2009-02-10 17:19:26] <ali_as> Ok. [2009-02-10 17:19:48] <Beetstra> Still, I would put it in a section on the datapage, and then make link it from the 'chembox supplement' [2009-02-10 17:20:56] <NormWork> Holy cow, unending phone calls. Lemme review... [2009-02-10 17:21:01] <Beetstra> Physchim62, you saw this: [15:55:15] <CheMoBot> user:Physchim62 has moved

WikiProject Chemistry/IRC discussions/10 February 2009
Except where otherwise noted, data are given for materials in their standard state (at 25 °C [77 °F], 100 kPa).

containing page Tin(II)_selenide to Tin_selenide: not ionic:

[2009-02-10 17:21:05] <Physchim62> disagree, I think it should go in the main article, just not in the chembox. See pyrazole for a small molecule with numbering in the chembox [2009-02-10 17:21:29] <Physchim62> yep, saw that, and several others recently [2009-02-10 17:21:37] <Beetstra> Is the numbering not too specialistic? [2009-02-10 17:22:17] <Physchim62> not really. it's part of nomenclature, and we try to cover nomenclature [2009-02-10 17:23:29] <Beetstra> That is true [2009-02-10 17:23:31] <Beetstra> OK [2009-02-10 17:24:11] <NormWork> The Endrin drawing is a "Haworth Projection" fwiw [2009-02-10 17:24:14] <Physchim62> for an example, could we talk about steroids without discussing numbering? [2009-02-10 17:25:13] <NormWork> I don't see how. Steroid numbering is a) inconsistant with everything else, and b) pretty vital [2009-02-10 17:25:15] <ali_as> I don't think so Norm. [2009-02-10 17:26:09] <ali_as> (Howerth) [2009-02-10 17:26:31] <Physchim62> There are IUPAC recommendations on the depiction of stereochemistry in structural formulae: I have read them (obviously :) ) but I know they exist [2009-02-10 17:27:14] <NormWork> ali_as: WIkipedia Haworth projection [2009-02-10 17:28:30] <Physchim62> http://goldbook.iupac.org/H02749.html [2009-02-10 17:30:19] <ali_as> That is not what I see when I look up Endrin. I see a 3D projection with indication when line pass behind other lines. [2009-02-10 17:31:45] <NormWork> I've seen the references to monosacharides, but also Cary and Sundberg (Advanced Organic Chemistry) define Haworth as any projection that displays a ring with Z-axis depicted as nearly orthogonal to the plane of the paper. [2009-02-10 17:32:23] <NormWork> I'm not an advocate for the terminology. [2009-02-10 17:33:03] <NormWork> I don't have my "March" here at work, so I don't know what they have to say about it. [2009-02-10 17:33:43] <NormWork> I guess what I'm saying is that the Endrin depiction seems - to me - as unambiguous. [2009-02-10 17:34:03] <walkerma> FYI, when we discussed carbohydrates about a year ago, we decided to use chair-type projections, rather than Haworth or Fisher [2009-02-10 17:34:13] <Physchim62> is it idiot-proof? I guess that's my question [2009-02-10 17:34:33] <Physchim62> for example, ChemSketch does not recognise ANY of the stereocenters [2009-02-10 17:35:32] <ali_as> It's not idiot proof, but then neither is what is in CAS. [2009-02-10 17:36:36] <ali_as> The arangement of the bonds in the CAS structure is automatically generated, chemsketch understands it but a human would not. [2009-02-10 17:37:01] <Physchim62> the CAS structure specifies all the stereocenters [2009-02-10 17:37:30] <NormWork> Doesn't that rank up there with IUPAC names for compounds that are entirely unintelligible? [2009-02-10 17:37:32] <ali_as> For example you have bonds coming out at you, that use the taper away bond becuase they are in an unnatural position. [2009-02-10 17:37:37] <Physchim62> but is it as evocative for the top-right image on the article [2009-02-10 17:37:39] <Physchim62> ? [2009-02-10 17:38:43] <ali_as> The result is that the stereochemistry is preserved in the CAS image, but the structure is garbled. [2009-02-10 17:39:05] <ali_as> (as far as human interpretation is concerned). [2009-02-10 17:40:55] <NormWork> Don't chemboxes normally contain IUPAC or CAS names? [2009-02-10 17:40:58] <ali_as> By which I mean 3D structure. [2009-02-10 17:41:14] <NormWork> That would completely and unambiguously specify the stereochemistry [2009-02-10 17:41:34] <ali_as> There are issues with that. [2009-02-10 17:42:05] <ali_as> I don't know where we can find IUPAC names for example and know they are right. [2009-02-10 17:42:25] <Physchim62> NormWork, I was just looking at that and yes, neither compound has its IUPAC name [2009-02-10 17:42:42] <Physchim62> I can do that: I will translate them from the CAS names [2009-02-10 17:43:19] <ali_as> How? [2009-02-10 17:43:39] <NormWork> ali_as: "know they are right" as in, "the name describes the structure correctly" or as in "this is the official name" [2009-02-10 17:43:44] <walkerma> ali_as: I think we need to standardize on a source for IUPAC names. ChemSpiderMan claims that the ACS Labs name generator is the best, and he has an independent literature study that shows that (COI: ChemSpiderMan wrote much of ChemSketch) [2009-02-10 17:43:53] <ali_as> Norm, as in this is the officla name. [2009-02-10 17:44:13] <NormWork> Yeah, that's tough, among other reasons, because an "official name" may not exist. [2009-02-10 17:44:32] <NormWork> IUPAC names are not monotonically connected. [2009-02-10 17:44:44] <Physchim62> the problem is that (at the moment) there is not necessarily a single IUPAC name [2009-02-10 17:44:51] <ali_as> Yeah. [2009-02-10 17:45:05] <Physchim62> the PIN-recommendations still haven't been passed [2009-02-10 17:45:13] <Physchim62> Preferred IUPAC Name [2009-02-10 17:45:38] <ali_as> I hit this issue early on in the CAS verify work before I was aware we were dropping the name checking. [2009-02-10 17:45:47] <Physchim62> the "final" draft had some fairly glaring inconsistencies, so it's been sent back to the commission [2009-02-10 17:45:47] <NormWork> PC: That's an artifact of the fact that the IUPAC grammar is not well formed. [2009-02-10 17:46:17] <Physchim62> NormWork, not at all, it comes directly from the spirit of IUPAC nomenclature [2009-02-10 17:46:51] <Physchim62> IUPAC nomenclature has never pretended to have a one-to-one correspondance from structure to name, merely from name to structure [2009-02-10 17:47:31] <Physchim62> this is the main difference between IUPAC nomenclature and CAS nomenclature (which is the inverse!) [2009-02-10 17:47:38] <NormWork> I think we're saying the same thing, PC. [2009-02-10 17:47:44] <Physchim62> so do I! [2009-02-10 17:48:04] <walkerma> ali_as: I think I'd like to get ChemSpiderMan to generate the IUPAC names for us - he's very knowledgeable on this, and actually advises IUPAC and ACS on some of their policies [2009-02-10 17:48:20] <Physchim62> but I get a bee in my bonnet when people (esp. school inspectors) pretend that there is a single IUPAC name for a compound [2009-02-10 17:48:25] <walkerma> Most of the IUPAC names in the SDF are generated by him [2009-02-10 17:48:38] <Physchim62> walkerma??? [2009-02-10 17:48:48] <Physchim62> which IUPAC names in the SDF? [2009-02-10 17:49:05] <Physchim62> the names in the SDF are CAS names [2009-02-10 17:50:08] <walkerma> OK, I'll have to look back - but they're all there in the "First 500 with CAS" file [2009-02-10 17:50:17] <Physchim62> CSM has done us some in the past, and will no doubt be willing to do us some more, but I haven't seen his hand on the current SDF file [2009-02-10 17:50:39] <Physchim62> yes, in the first 500 file, I seem to remember he generated them for us [2009-02-10 17:51:02] <NormWork> Does he have an automaton to generate names? [2009-02-10 17:51:12] <Physchim62> yes, in a word! [2009-02-10 17:51:37] <Physchim62> he uses a souped-up version of the ACDLabs (ChemSketch) name generator [2009-02-10 17:51:50] <walkerma> CSM is very busy with ChemSpider these days, but if we have a very specific request like this he will do it. If we send him an SDF of CAS structures, he can generate the most usual forms of IUPAC name for all of these - he will probably provide us with those within a few days [2009-02-10 17:51:56] <walkerma> For the whole file [2009-02-10 17:52:03] <walkerma> If we want it! [2009-02-10 17:52:09] <ali_as> walkerma, is there any way IUPAC names can be checked with IUPAC? [2009-02-10 17:53:12] <NormWork> I'm more of the opinion that if the IUPAC-style name accurately reflects the structure, then it is, in fact, an IUPAC name. [2009-02-10 17:53:15] * Physchim62 has to go out for a moment: back in 5 min [2009-02-10 17:53:16] <walkerma> ali_as: Good question! We actually have a very good relationship with them. Rifleman82 & I met up with Fabienne Meyers last year, and CSM knows them well too [2009-02-10 17:54:33] <walkerma> NormWork: You want to try and be consistent, though. We had someone arguing last year that caesium should be spelt as cesium, because "the majority of English speakers are in the US where it is spelled cesium" [2009-02-10 17:55:04] <walkerma> Even though IUPAC prefer the spelling caesium, he/she argued that "IUPAC accepts cesium" [2009-02-10 17:55:39] <ali_as> I guess we lost sulphur and won caesium. [2009-02-10 17:55:44] <walkerma> (The same qrguments happen with sulphur and aluminum) [2009-02-10 17:56:15] <ali_as> I was pretty surprised about caesium. [2009-02-10 17:56:41] <NormWork> Well, there's a Kanji for caesium, one should argue that the most users use that "spelling" [2009-02-10 17:57:20] <walkerma> So I think if we get one piece of software - allegedly the best - to generate all of our names for us, that should be the standard we use [2009-02-10 17:57:26] <NormWork> I see that kind of "dispute" as right up there with "Is Pluto a Planet" -- an enormous waste of time. [2009-02-10 17:57:46] <NormWork> If I come across a bottle of caesium sulphate, I'm bloody well going to know what it is. [2009-02-10 17:58:11] <walkerma> http://en.wikipedia.org/wiki/Color_of_the_bike_shed [2009-02-10 17:58:18] * NormWork is cranky this morning, and apologizes. [2009-02-10 18:00:06] <ali_as> The problem Norm, is that to put together anything in a structured way, you need to know those sorts of annoying answers.  :/ [2009-02-10 18:00:17] <NormWork> Maybe I should start posting to rec.chemistry.advocacy.caesium [2009-02-10 18:00:49] <Physchim62> let's not forget that we had the President of the IUPAC inorganic nomenclature committee editing WP last September: he was not exactly helpful [2009-02-10 18:00:51] <NormWork> ali_as: Norm's Solution: Flip a coin. Declare an "official" version. [2009-02-10 18:01:19] <ali_as> Problem with that is that would be a wikipedia official version, at best, if we did it. [2009-02-10 18:01:27] <NormWork> In fact, the reason I inquired about CSM's automaton, is that the pie-in-the-sky answer is to declare that the IUPAC names so generated are the "right" ones, and sell the idea to IUPAC. [2009-02-10 18:01:36] <ali_as> And it would differ from anyone elses official version. [2009-02-10 18:01:41] <NormWork> ali_as: I see that as a problem in marketing. [2009-02-10 18:01:47] <ali_as> Haha. [2009-02-10 18:01:47] <Physchim62> ali_as, NormWork, that's pretty much what we did on WP, right at the start [2009-02-10 18:02:24] <NormWork> Well, the "sell the idea to IUPAC" isn't a trivial part of that "solution" [2009-02-10 18:02:38] <NormWork> But I bet it's not impossible, either, since they're dealing with a very intractable problem. [2009-02-10 18:02:46] <walkerma> See http://www.mdpi.org/molecules/papers/11110915.pdf [2009-02-10 18:03:28] <Physchim62> speaking of nomenclature, does anyone here have any strong feelings about Stock nomenclature? (ie "iron(II) chloride" etc) [2009-02-10 18:03:36] <walkerma> NormWork, ali_as, I agree that we simply need to declare that these are the "right" ones as far as we are concerned [2009-02-10 18:04:42] <walkerma> Physchim62: That old chestnut! AFAIK, the original recommendations suggested Stock for ionic, and "covalent" naming for covalent compounds, is that correct? [2009-02-10 18:05:07] <walkerma> So we would have cerium(III) chloride, but sulfur dichloride [2009-02-10 18:05:30] <walkerma> The problem comes with something in-between. [2009-02-10 18:05:30] <ali_as> Does the naming program we would be using have a free version? [2009-02-10 18:05:58] <walkerma> ali_as: No, but the free ones are nowhere near as reliable. [2009-02-10 18:06:09] <ali_as> I see that as a problem. [2009-02-10 18:06:28] <walkerma> But if CSM does all of them for us within a few days of our request, that's the next best thing [2009-02-10 18:06:38] <Physchim62> walkerma, yes, but from my experience on CAS verification, I am going to update the naming conventions, with examples. I appear to have Smokefoot's support, which is nice (for a change!) but I wanted to take the opportunity to ask people here as well [2009-02-10 18:06:47] <ali_as> But it excludes most of the people that would be reading the article. [2009-02-10 18:06:50] <walkerma> We don't want to populate our articles with a lot of actually WRONG names [2009-02-10 18:06:56] <Physchim62> the names can't be copyrighted [2009-02-10 18:07:08] <Physchim62> and in any case, they're available on ChemSpider [2009-02-10 18:07:20] <ali_as> At least if we used a free one that did not quite match IUPAC, we'd be consistent and we wouldn't be excluding anyone, that's more int he spirit of wikipedia. [2009-02-10 18:07:35] <Physchim62> the basic version of the software is shareware [2009-02-10 18:07:48] <Physchim62> ali_as, you already have it ;) [2009-02-10 18:08:32] <ali_as> I've tried to use it, it tells me I have to buy the full version for anything above a certain number of atoms. [2009-02-10 18:08:44] <walkerma> Physchim62: Yes, they're on ChemSpider! Great point! And ChemSketch uses a basic version that works for most things. We'll just ask for help on http://en.wikipedia.org/wiki/Palytoxin [2009-02-10 18:08:46] <Physchim62> and the IUPAC rules are freely available [2009-02-10 18:09:10] <Physchim62> when we have disputes about names, we refer back to the rules, not the automatically generated name [2009-02-10 18:09:43] <Physchim62> I have local copies of almost all the relevant IUPAC guidelines, including some which have yet to be approved [2009-02-10 18:10:24] <walkerma> Physchim62: Regarding Stock names, my impression is that the "de facto" version of the IUPAC rules, in practice, is that metals usually use the Stock name, except in cases where a "covalent" name was already common (e.g., manganese dioxide) [2009-02-10 18:10:29] <Physchim62> but CSM can do it a lot quicker, and with easonable accuracy [2009-02-10 18:11:12] <Physchim62> walkerma, IUPAC "tolerates" Stock nomenclature, but it has several defects even for ionic compounds [2009-02-10 18:11:59] <walkerma> Look at a Strem or Alfa catalogue, Stock us widely used. I do accept that Pr6O11 or TlI3 can pose problem - but that's true in ANY system! [2009-02-10 18:12:48] <walkerma> However, these are just my opinions; and certainly Smokefoot knows way more than an obscure organic chemist from Potsdam! [2009-02-10 18:13:05] <Physchim62> we don't even have an article on Stock nomenclature, that will have to change! ;) Catalogues use it because it helps with alphabetization [2009-02-10 18:14:13] <walkerma> It's certainly easier to find indium(III) oxide rather than diindium trioxide (Martin himself likes indium sesquioxide best!) [2009-02-10 18:14:16] <Physchim62> I'll take the opportunity that Smokefoot and I are not sniping at each other ;) to get him to check the guidelines [2009-02-10 18:14:51] <walkerma> PC: Don't worry, I'll just set Wim on you instead! :) [2009-02-10 18:15:18] <Physchim62> at least that would be a change! I've not had a good row with Wim for months now! [2009-02-10 18:17:45] <Physchim62> OK, so my plans for the next week are to finish the inorganics (probably tonight), revise the naming conventions and help out with the other validation problems (in particular endrin/dieldrin) [2009-02-10 18:17:46] <walkerma> OK: My plans are to get the Excel file complete up to entry 1000, then to focus on getting those 1000 looking "nice". I will ask CSM to generate IUPAC names for the complete Union file, too [2009-02-10 18:18:12] <Physchim62> when you ask CSM, can you get InChI and InChIkeys as well? [2009-02-10 18:18:15] <walkerma> I should finish listing up to 1000 this week [2009-02-10 18:18:26] <walkerma> Physchim62: Yes [2009-02-10 18:18:46] <walkerma> Physchim62: Note that the rules for InChIKeys just changed very recently [2009-02-10 18:19:07] <Physchim62> and ask him what is going on with the new version InChIs which are populating his website, and what we should do about them [2009-02-10 18:19:53] <Physchim62> I could go and find out myself, but CSM will know anyway! [2009-02-10 18:19:54] <walkerma> http://www.chemspider.com/blog/standard-inchis-and-inchikeys-populated-to-chemspider.html [2009-02-10 18:20:07] <walkerma> http://www.iupac.org/inchi/release102final.html [2009-02-10 18:20:57] <walkerma> Ali_as: Do you have enough to keep you busy? [2009-02-10 18:21:10] <walkerma> I feel like I've struggled to keep up with you! [2009-02-10 18:21:59] <ali_as> I have enough ;) [2009-02-10 18:23:32] <Physchim62> walkerma, I noticed that the entry numbers in the Excel file don't quite match the entry numbers on the SDF file [2009-02-10 18:23:51] <Physchim62> is this a deliberate feature? [2009-02-10 18:24:07] <ali_as> Union has more compounds I think. [2009-02-10 18:24:15] <walkerma> We should soon - perhaps by this weekend - have 700-900 ready to validate online. I already sent 901-1000 to ali_as, that set was deliberately done for him. I'll probably post the version with 1-800 and 901-1000 in the next day or so, so that people can get on with the online work, and then someone else can do 801-900 next week. [2009-02-10 18:24:35] <walkerma> Physchim62: Can you give me an example? [2009-02-10 18:24:47] <Physchim62> when I started at 400 on the Excel file, the cpd was 404 on the SDF file, or something like that [2009-02-10 18:25:08] <walkerma> Which SDF file was that? [2009-02-10 18:25:52] <walkerma> I hope I didn't accidentally miss three structures! [2009-02-10 18:25:58] <walkerma> or four [2009-02-10 18:26:57] <Physchim62> yes, that's it: entry 401 on the Feb3 Excel file is [67-47-0]: this is entry 404 on the SDF file [2009-02-10 18:27:12] <ali_as> As far as I am aware the Union SDF file contains wikipedia entries as well. [2009-02-10 18:27:20] <Physchim62> the SDF file from December 2008, arranged in CASRN order [2009-02-10 18:27:27] <walkerma> I don't have a copy of the CAS file on its own - I can't decompress it [2009-02-10 18:27:49] <Physchim62> my SDF file has 6206 entries [2009-02-10 18:28:19] <ali_as> Hmmm. [2009-02-10 18:28:32] <Physchim62> am I using a different file from everyone else? [2009-02-10 18:29:20] <walkerma> I think you may be! They are supposedly all based on the same data, so it shouldn't matter TOO much, though [2009-02-10 18:30:15] <Physchim62> It's not a problem, we have a common unique-key [2009-02-10 18:30:41] <walkerma> ali_as and I have been using the Union file "CAS-WikipediaSDF-Union.SDF" which has 9239 entries [2009-02-10 18:31:03] <ali_as> I have based everything I've done on CAS-Wikipedia-SDF-Union.zip from 18th December. [2009-02-10 18:31:06] <Physchim62> I have already thought of that when I was considering how to integrate the inorganics [2009-02-10 18:31:17] <walkerma> It has everything from both the WP SDF and the CAS SDF [2009-02-10 18:31:45] <Physchim62> OK, I have been working from the original CAS file (the .gz) [2009-02-10 18:32:01] <walkerma> OK, that's good [2009-02-10 18:32:09] <Physchim62> you can unzip it with 7-Zip [2009-02-10 18:32:24] <walkerma> Physchim62: Can you post the complete file on pluto for us? [2009-02-10 18:32:42] <Physchim62> the complete file is just over 10Mb [2009-02-10 18:33:23] <walkerma> That's OK if your connection is good - I have at least 1GB available [2009-02-10 18:33:24] <Physchim62> 10.258 Mb, created Dec 17, 2008 [2009-02-10 18:34:12] <Physchim62> my connexion is fine. I'll try, and email you if I have problems. I obviously can't email it to people, cos it's too large [2009-02-10 18:34:33] [ERROR] Connection to irc://irc.freenode.net/ (irc://irc.freenode.net/) reset. [[Help][Get more information about this error online][faq connection.reset]] [2009-02-10 18:34:49] [INFO] Connecting to irc://irc.freenode.net/ (irc://irc.freenode.net/)… [[Cancel][Cancel connecting to irc.freenode.net][cancel]] [2009-02-10 18:34:51] === *** Looking up your hostname... [2009-02-10 18:34:51] === *** Checking ident [2009-02-10 18:34:51] === *** Found your hostname [2009-02-10 18:34:51] === *** No identd (auth) response [2009-02-10 18:34:51] === *** Your host is anthony.freenode.net[anthony.freenode.net/6667], running version hyperion-1.0.2b [2009-02-10 18:34:52] >NickServ< IDENTIFY conelleta [2009-02-10 18:34:58] -->| YOU (Physchim62) have joined #wikichem [2009-02-10 18:34:58] =-= Topic for #wikichem is ``This is the central en.wikipedia channel for chemistry related subjects. CheMoBot is logging! type 'chemobot help' for help (or see User:CheMoBot). see User talk:CheMoBot/Data for a discussion on the data-format [2009-02-10 18:34:58] =-= Topic for #wikichem was set by Beetstra on martes, 11 de noviembre de 2008 17:00:08 [2009-02-10 18:34:58] *ChanServ* [#wikichem] Welcome to the wikipedia chemistry channel [2009-02-10 18:34:58] =-= Mode #wikichem +v Physchim62 by ChanServ [2009-02-10 18:35:11] <walkerma> He found that Threonine (entry 733 in the Excel file) is completely absent from the Union SDF file we've been using [2009-02-10 18:35:18] <Physchim62> obviously my connexion doesn't like being told that it's fine! [2009-02-10 18:35:54] <Physchim62> it is entry 736 in the CAS SDF [2009-02-10 18:35:56] <walkerma> Spooky! [2009-02-10 18:36:06] <Physchim62> [80-98-2] [2009-02-10 18:36:13] <ali_as> Curiouser and curiouser. [2009-02-10 18:36:14] <Physchim62> sorry [80-68-2] [2009-02-10 18:36:18] <walkerma> Physchim62: The CAS No. seems to be absent from the Union file. [2009-02-10 18:36:42] <walkerma> I'm wondering if there are small differences between the XML file they sent us and the SDF - it's entirely possible [2009-02-10 18:36:56] <Physchim62> I also have an Excel version (without structures, obviously) [2009-02-10 18:37:05] <ali_as> The xml is from october? [2009-02-10 18:37:27] <Physchim62> that's how I've managed to sort out the Inorganics: I can order the Excel file by formula [2009-02-10 18:37:35] <walkerma> The other explanation would be that the Union command lost a couple of entries somehow - maybe there were typos or incorrect structure matches or something [2009-02-10 18:38:07] <walkerma> Or the simplest explanation is that I'm an idiot and I just missed a few when I was scrolling through the list at 2am! [2009-02-10 18:38:59] <ali_as> The xml is a little...glitchy too. [2009-02-10 18:39:06] <Physchim62> I'll investigate further [2009-02-10 18:39:20] <ali_as> Some of the entries look hand edited. [2009-02-10 18:41:41] <Physchim62> wouldn't surprise me, CAS have given us problems in the past with file formats [2009-02-10 18:42:23] <walkerma> Well, it looks like we're moving ahead OK anyway. I think by the end of February we'll have most of the first 1000 verified, and we can start moving onto 1001-2000 [2009-02-10 18:42:47] <walkerma> Physchim62: How many inorganics will you have? [2009-02-10 18:43:05] <Physchim62> 677 CASRNs [2009-02-10 18:43:40] <Physchim62> I haven't calculated the full stats yet, but I will do [2009-02-10 18:43:47] <walkerma> Excellent! So that will be well over 1000 pages verified. [2009-02-10 18:43:51] <Physchim62> given that people liked the last report [2009-02-10 18:44:12] <Physchim62> I'll add an appendix on safety validation as well [2009-02-10 18:44:48] <walkerma> I thought I'd get CSM to generate a nice SDF version of the "first 1000" CAS file, too, though we'll have to find those missing few articles first [2009-02-10 18:44:55] <Physchim62> otherwise, there are no new problems with the second half than the first half [2009-02-10 18:45:13] <walkerma> So we can send them your report and the SDF, and I'll write a report too [2009-02-10 18:45:22] <Physchim62> It takes time to write 150 or so articles, walkerma! [2009-02-10 18:46:24] <Physchim62> by the time we've written them, I'll have finished my random selection as well [2009-02-10 18:46:48] <walkerma> I'm guessing that out of the 1000, we will have "perfect match" status (level 3 or 4) for at least 600 articles - giving us well over 1000 including your inorganics [2009-02-10 18:47:56] <walkerma> FYI: The Version 0.7 release could drag me away from chemistry work in March; I'm holding an IRC next week on that, and after then things will begin to hot up a lot as we move to publication [2009-02-10 18:48:34] <walkerma> So I hope that you guys can maintain the momentum while I'm wading through a list of 31,000 articles [2009-02-10 18:49:16] <walkerma> I should be able to do a little bit on chemistry, and then return more actively in April, probably [2009-02-10 18:49:23] <Physchim62> we'll sort something out [2009-02-10 18:49:44] <ali_as> Sorry, newbie question. 0.7? [2009-02-10 18:50:19] <walkerma> http://en.wikipedia.org/wiki/Wikipedia:Version_0.7 [2009-02-10 18:50:39] <walkerma> It's the first DVD release of the English Wikipedia [2009-02-10 18:50:50] <walkerma> organised by the WP community [2009-02-10 18:51:20] <walkerma> (except for the Schools release, which is around 6000 articles) [2009-02-10 18:51:25] <Physchim62> second, surely [2009-02-10 18:51:39] <walkerma> The first (Version 0.5) was on CD :) [2009-02-10 18:51:45] <Physchim62> WP0.5, I have that CD :) [2009-02-10 18:51:47] <walkerma> That was only 2000 [2009-02-10 18:52:12] <walkerma> 2000 articles. Thanks for getting that, PC! [2009-02-10 18:53:07] <Physchim62> the idea behind them is to have stable versions which can be used without the need for Internet access [2009-02-10 18:53:08] <walkerma> It's taken us almost two years to get to 0.7, mainly due to the problem of scaling up from a hand-selection of 2000 to a bot-based selection of 30,000 [2009-02-10 18:53:25] <Physchim62> are you indexing 0.7 as well? [2009-02-10 18:54:09] <walkerma> But now that work is done, and some of the other behind-the-scenes work (generating a useful index, search engine, article formatting, handling of images, etc) we will be able to make new releases VERY easily in the future [2009-02-10 18:54:47] <Physchim62> not to mention the generalization of WikiProject rating of articles [2009-02-10 18:54:50] <walkerma> Physchim62: Yes, we already have an index by location, and we're using the same type of code to generate a subject-based index [2009-02-10 18:55:07] <Physchim62> which I'm trying to extend to Featured Articles as well [2009-02-10 18:55:43] <Physchim62> (with my usual and well-known tact) [2009-02-10 18:56:02] <walkerma> Last autumn I worked through a list of 11,500 keywords derived from category names, assigning them to a Polish district here, a Hindu goddess there, etc [2009-02-10 18:56:23] <walkerma> We are now using that to generate the index [2009-02-10 18:56:47] <walkerma> PC: How do you mean extending it to FAs? [2009-02-10 18:56:47] <Physchim62> http://en.wikipedia.org/wiki/Wikipedia_talk:FAC#Once_again.2C_FAC_produced_featured_crud [2009-02-10 18:57:24] <Physchim62> walkerma, under my suggestion, WikiProjects would be responsible for nominating article for the Main Page [2009-02-10 18:58:52] <Physchim62> so chemistry would be pretty much guaranteed one every 2–3 months: up to the WikiProjects to create articles of a suitable quality or look rather silly in front of millions of people [2009-02-10 18:59:05] <walkerma> Aha! You should join us on IRC this weekend, we're going to debate the status of A-Class. Some are arguing to abolish it (saying that it serves no purpose as GA and FA are more widely used) but others such as myself have argued for an attempt to establish A-Class peer reviews at WikiProjects [2009-02-10 18:59:37] <Physchim62> can you email me with the details (so I don't lose them) [2009-02-10 18:59:40] <walkerma> Though maybe we should get our own house in order first..... [2009-02-10 18:59:51] <Physchim62> OUR house is in order [2009-02-10 19:00:02] <walkerma> http://en.wikipedia.org/wiki/Wikipedia_talk:Version_1.0_Editorial_Team/Assessment#Request_for_Comment_regarding_A-Class_assessments [2009-02-10 19:01:06] <Physchim62> ouch, that's recent debate, I'll have to read that properly ;) [2009-02-10 19:01:08] <walkerma> Physchim62: I disagree. Many of our A-Class articles would not pass muster in today's assessment climate, and even some of our B-Class are quite lame. This is because they were assessed in 2006 and we haven't reviewed them since [2009-02-10 19:02:08] <Physchim62> that simply means that the assessment climate is wrong ;) actually, that's my main complaint about the current WP:FAC, that they're a useless waste of time [2009-02-10 19:02:18] <Physchim62> they don't guarantee good articles [2009-02-10 19:02:20] <walkerma> And this page has been horribly inactive for about two years; shame on all of us, except Wim [2009-02-10 19:02:22] <walkerma> http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals/List_of_A-Class_articles [2009-02-10 19:02:58] <Physchim62> not at all, A-class articles were supposed to be those that didn't need anything urgent doing to them [2009-02-10 19:03:52] <Physchim62> if you had the energy spent on FA devoted to improving stubs, we would have a better encyclopedia [2009-02-10 19:03:53] <walkerma> PC: I see your point about FAC - which you made beautifully - but I'd argue that we need both aspects right for FA. Both aspects = Style (which is covered well at GAN and FAC) and Content (covered well by WikiProject reviews) [2009-02-10 19:04:19] <Physchim62> I think WikiProjects can handle style as well [2009-02-10 19:04:24] <walkerma> Physchim61: True, but you won't get that energy moved around - people work on what they like [2009-02-10 19:04:50] <Physchim62> especially if they have to make their nominations public well before they go on the Main Page (to allow the style experts to check them) [2009-02-10 19:04:53] <walkerma> (I was speaking to your predecessor there!) [2009-02-10 19:05:27] <walkerma> Some of our A-Class would be laughed at at FAC today, though they looked great to us in 2006! [2009-02-10 19:05:59] <Physchim62> It is pointless putting a science article to WP:FAC [2009-02-10 19:06:08] <Physchim62> it will come out of the process WORSE [2009-02-10 19:06:09] <walkerma> I do think that any community needs to have some "showcase" articles, to show what can be done. [2009-02-10 19:06:29] <Physchim62> yes, but let each community choose what they want to put on show [2009-02-10 19:06:36] <walkerma> Physchim62: I see your point there about science and FAC, that's why I've argued for A-Class [2009-02-10 19:06:54] <walkerma> When would be a good time for you this weekend, to talk on IRC? [2009-02-10 19:07:26] <Physchim62> evenings CET, mornings EST [2009-02-10 19:07:28] <walkerma> I'll try to choose a meeting time when you can make it... [2009-02-10 19:07:44] <walkerma> Sunday OK? [2009-02-10 19:08:05] <Physchim62> yep, although I might have a class 1500-1600UTC [2009-02-10 19:08:15] <Physchim62> (1600-1700CET) [2009-02-10 19:08:20] <walkerma> I'm thinking something like 2pm US EST, 2000h CET [2009-02-10 19:08:30] <Physchim62> no problem [2009-02-10 19:08:58] <Physchim62> email me with the details, I must step out again [2009-02-10 19:09:04] <walkerma> Perhaps you could post your ideas athttp://en.wikipedia.org/wiki/Wikipedia_talk:Version_1.0_Editorial_Team/Assessment [2009-02-10 19:09:25] <walkerma> so we can look at them first [2009-02-10 19:09:29] <walkerma> digest them [2009-02-10 19:09:53] -->| Beetstra_ (n=djbeetst@Wikimedia/Beetstra) has joined #wikichem [2009-02-10 19:09:53] =-= Mode #wikichem +v Beetstra_ by ChanServ [2009-02-10 19:10:04] <walkerma> ali_as: You may be interested that the 1.0 assessment scheme - now used on 1.6 million articles - started on WP:Chem [2009-02-10 19:10:13] <Physchim62> Yes, I will have to produce a viable proposal, instead of *just* insulting people: Ho hum, such is life ;) [2009-02-10 19:10:33] |<-- Beetstra has left irc.freenode.net (Nick collision from services.) [2009-02-10 19:10:48] <Physchim62> TTFN everyone, and thanks as always for your contributions [2009-02-10 19:10:50] <walkerma> Yes!