Wikipedia talk:Chemical infobox

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Chemistry (Rated Template-class)
WikiProject icon This page is within the scope of WikiProject Chemistry, a collaborative effort to improve the coverage of chemistry on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 Template  This page does not require a rating on the project's quality scale.
 

Pronunciation[edit]

We have pronunciation in the Template:Infobox_drug. Would be useful to have in this one aswell. Doc James (talk · contribs · email) 14:59, 1 September 2016 (UTC)

Agree. there was a brief prior discussion on this Wikipedia_talk:Chemical_infobox/Archive_9#Pronounce_parameter? Sizeofint (talk) 17:30, 1 September 2016 (UTC)
@Sizeofint and Doc James: I can build something. Please specify the location for that datarow, e.g. by these demos. Would be like "right under the Names subheader", "under the top infobox name", etc.
Note that the pron should be unambiguously close to the name it says. There should be no confusion, while there are multiple names and name-types in {{Chembox}}. (Note: in {{Drugbox}}, I have proposed these order changes. When accepted, the pron datarow will be the top text row (under the images), near the infobox title.) -DePiep (talk) 14:45, 11 November 2016 (UTC)
I would put it under the "names" heading. Doc James (talk · contribs · email) 18:15, 11 November 2016 (UTC)
  • Yes check.svg Done |pronounce=(any text) will show in section Names. -DePiep (talk) 17:37, 19 November 2016 (UTC)
Not working at glucose User:DePiep Doc James (talk · contribs · email) 17:44, 29 November 2016 (UTC)
Fixed. Good point. I said, in the editsummary: /* top */ fix: |pronounce= should be outside of any {{Chembox|SectionN=}} (yes I know. yes I know. But hey, this template now does handle 575+ parameters :-) )). -DePiep (talk) 18:48, 29 November 2016 (UTC)

ECHA InfoCard ID[edit]

What about adding ECHA's InfoCard ID to the chembox? They were introduced in January and are available for 120.000 chemicals. According to ECHA, an InfoCard serves as a high-level summary for a broad public, consisting of information that is most relevant to an audience of consumers, downstream users and professionals active in the chemical industry.
The InfoCard ID of DEHP is 100.003.829 and the full URL is http://echa.europa.eu/substance-information/-/substanceinfo/100.003.829.
As a sidenote, the same ID also works for other ECHA databases, e.g.:

BTW: On Wikidata, I made a bot request to import the data to chemicals' items using the newly created property. Once, this will be done, the IDs may be obtained from there. --Leyo 00:15, 1 March 2016 (UTC)

I've put a notnote at d:Property_talk:P2566 on why the id is not the same as the EC number (/list number). -DePiep (talk) 07:49, 1 March 2016 (UTC)
I replied there. --Leyo 08:51, 1 March 2016 (UTC)
Concerning your comment “Job for wikidata then, useless route to add this locally.”: Yes, the IDs should be added to the Wikidata items. What I am talking about here is to obtain these IDs from Wikidata in order to show them (incl. the link to the ECHA InfoCard) in the chembox. --Leyo 15:17, 2 March 2016 (UTC)
We understand. It looks like this datapoint should be retrieved from wd only, no need for local parameter option.
Back to the prime question: add this external link?: yes, looks like a very useful and helpful link. (In the longer future, this should go into an "External links" box. Example: Gout). -DePiep (talk) 19:28, 5 March 2016 (UTC)

Update[edit]

The Wikidata property is now contained in a few thousand items. By adding the following code to the chembox, the ECHA InfoCard ID is being shown in each corresponding article.

{{#if: {{#invoke:Wikidata|claim|P2566}} |
{{!}} [[European Chemicals Agency|ECHA]] InfoCard
{{!}} [http://echa.europa.eu/de/substance-information/-/substanceinfo/{{#invoke:Wikidata|claim|P2566}} {{#invoke:Wikidata|claim|P2566}}]
}}

We may either add it to {{Chembox Identifiers}} or to {{Chembox Hazards}} (InfoCards contain relevant regulatory information). --Leyo 08:36, 3 October 2016 (UTC)

I would just source this from WD, no need for first making this local while we are already discussing for some information to make the move (and, depending on how 'comparable' WD and local data really is, I am in favour of sourcing over local data). Support adding the code to {{Chembox Identifiers}} and/or {{Chembox Hazards}} (there is different information on different urls). --Dirk Beetstra T C 09:02, 3 October 2016 (UTC)
I am not sure if I got your first sentence right. The suggestion is to transclude the ID from Wikidata, not to copy it. --Leyo 09:31, 3 October 2016 (UTC)
That is what I meant as well: transclusion, Leyo. --Dirk Beetstra T C 10:31, 3 October 2016 (UTC)
Since the InfoCard (example) contains a summary and links to other interesting pages, it is probably enough to link only this one for now. --Leyo 20:35, 3 October 2016 (UTC)

I have boldly implemented it in the Chembox Identifiers, see diff. Please revert if it does break somewhere, it is rather difficult to test this in the testcases. --Dirk Beetstra T C 05:14, 4 October 2016 (UTC)

Thank you. I haven't spotted any issues so far. --Leyo 08:22, 4 October 2016 (UTC)

Add option for local input?[edit]

Today, in the 10.600 {{Chembox}} articles some 6000 are showing this d:ECHA InfoCard ID value. Should we add a local (enwiki) parameter |ECHA InfoCard ID= to allow local overwriting/adding? These situations may occur: 1. the WD value may be incorrect for the article, or need detailing. This requires a local source to be used (|ECHA InfoCard ID ref=), and checking these articles is a maintenance task (using a tracking category). 2. The en:article title may not be the exact WD item title, so the WD check fails and returns a blank. Useful? -DePiep (talk) 08:22, 17 November 2016 (UTC)

If an InfoCard ID is missing at Wikidata, it should get added there. Or could you provide an example where is wouldn't work? --Leyo 13:04, 17 November 2016 (UTC)
I don't have an example. I thought of the situation where the article title differs from the WD-item. This category lists ~500 with multiple chemicals (by using |CASNon=-indexes). Sure these indexed chemicals could have their own ECHA (which could be indexed too or could use the expensive non-natural item fetch, if I'm correct). -DePiep (talk) 13:14, 17 November 2016 (UTC)

Also in {Drugbox} then?[edit]

Today, {{Infobox drug}} does not have some parameter |ECHA Infocard=. Worth adding, by Wikidata automated? -DePiep (talk) 21:26, 19 November 2016 (UTC)

Well, drugs are not so much in the scope of REACH and CLP the thus of the ECHA. However, I assume that there are still many compounds with drugbox that have an ECHA InfoCard. --Leyo 18:40, 20 November 2016 (UTC)
You might be able to verify that using the lists of articles with a Wikidata item containing that property (irrespective of the infobox used in the articles). --Leyo 23:26, 23 November 2016 (UTC)
To learn what? If we (enwiki editors) decide that we should add the ECHA data points to the drugbox, then we´ll do it (from Wikidata or by local input does not matter). It is not the right route to say "Wikidata has it, so let's add it". -DePiep (talk) 10:41, 24 November 2016 (UTC)
It's about the availability of InfoCards at ECHA. The InfoCard IDs (if existing) were added by a bot to all Wikidata items with a CAS RN. --Leyo 22:40, 24 November 2016 (UTC)
Ergh, no. If we, editors, decide ENCHA data link should be in {{Infobox drug}}: it will be in there. Either by Wikidata, or by local input. BUT. Today ENCHA is not an accepted input option, so ECHA is not releveant input. -DePiep (talk) 22:49, 24 November 2016 (UTC)
What's ENCHA? --Leyo 22:58, 24 November 2016 (UTC)
"ENCHA" is a common, generic name for "any data thing you want it to mean". By context, that should be clear already. If you need extra individual advice by example, just read: "ECHA". -DePiep (talk) 23:10, 24 November 2016 (UTC)
Well, if in the lists mentioned above there were very few articles with drugbox, it would probably not be worth adding such a parameter to drugbox. This is what is the potential use of these lists. --Leyo 23:17, 24 November 2016 (UTC)
That's my question in point: how is it "worth" inclusion/exclusion by this reasoning? If few drugs have an ECHA, these must be remarkable exceptions. If many have, it's quite common to have an ECHA. Both ways: good to add. Be sure, this can be said about dozens of data points (today already: E number food additive). A good reason not to add ECHA (either local or from Wikidata) is: "ECHA is not relevant (enough) for a drug" (I do not know this myself, to be clear. It is what I'm asking here). -DePiep (talk) 06:59, 25 November 2016 (UTC)

There are currently 2579 out of 6529 articles (40%) with drugbox that have an InfoCard ID on Wikidata. --Leyo 08:08, 30 November 2016 (UTC)

Leyo. I've formally put the question at Template_talk:Infobox_drug#Add ECHA InfoCard?. (I'd rather not spend time advocating, I'll just see what happens. I want to spend my wikitime on introducing Wikidata big time in these templates). -DePiep (talk) 16:23, 1 December 2016 (UTC)
Yes check.svg Added to drugbox, see talklink. Note: before this edit, the category listed 6084 Articles (that's from the ~10200 Chembox's only). -DePiep (talk) 10:48, 8 December 2016 (UTC)

Tracking category[edit]

Article ECHA InfoCard needed[edit]

We need article ECHA InfoCard. Today, 6000 articles link to it. Link to ECHA is not enough. -DePiep (talk) 19:40, 27 November 2016 (UTC)

Wikidata tracking categories[edit]

Proposal by now: write 'Wikidata' not 'wd'

Through ECHA Card, {{Chembox}} has entered Wikidata world. I propose to add tracking categories for all our wikidata (wd) entries. When done structured, they can be helpful and even support maintenance (corrections). Inspiring example: {{Authority control}} categorisation.

  • About the category name pattern:
1. Use the wd property name ('ECHA Infocard ID') for the data point.
2. Write pattern "[wd name] from wdwikidata" ('ECHA Infocard ID from Wikidata'). 1. name in front, 2. short (it is a tracking category, not a readers category).

Comments? -DePiep (talk) 21:13, 11 November 2016 (UTC)

Having a tracking category is probably a good idea. I would, however, propose to write “Wikidata” instead of just “wd”. --Leyo 13:07, 17 November 2016 (UTC)
Well, expect dozens of categories (CAS, InChI, ...) in one article. Tracking categories they are. I boldly dare to say: these are editor-only cats, not Reader space cat names! Description can be in a cat lede (=on the cat page). Shortened cat names, with dozens of tracking cats, the editor will understand. (maybe write "WD" not "wd"?).
Example of bad: see here. Not WP:IAR ie not All, but let's improve that rule for tracking cats. -DePiep (talk) 23:51, 17 November 2016 (UTC)
“WD” is surely better than “wd”. --Leyo 22:03, 18 November 2016 (UTC)
Thanks. I am even podering like 'cat:ECHA Infocard ID from wikidata'. But definitely not "cat:Articles using Chembox and having ECHA Infocard ID fetched from wikidata". -DePiep (talk) 22:54, 18 November 2016 (UTC)
Category:ECHA Infocard ID from Wikidata looks good. -DePiep (talk) 23:29, 18 November 2016 (UTC)
Wikidata has a capital “W”. --Leyo 23:37, 18 November 2016 (UTC)
Not in this proposal. DePiep -23:41, 18 November 2016 (UTC)
OK, "W" it is. -DePiep (talk) 12:53, 19 November 2016 (UTC)
About ready to go live. -DePiep (talk) 22:13, 21 November 2016 (UTC)

Jmol live in WP[edit]

I have mentioned this suggestion in the Village pump (tech). If you know more, please add there. I am very low in the J/JS area. -DePiep (talk) 23:34, 17 November 2016 (UTC)

Archived here for future reference. Sizeofint (talk) 09:41, 25 November 2016 (UTC)

E number from Wikidata[edit]

I have prepared the E number (food additive codes) data row to get the E number from Wikidata. Articles doing so will be tracked in Category:E number from Wikidata. With this, old |E number= and |E number Comment= are deprecated; there is no option to enter local data. Go? -DePiep (talk) 22:19, 21 November 2016 (UTC)

Chemical articles without CAS Registry Number[edit]

Not sure that this goes here but can someone please undelete Category:Chemical_articles_without_CAS_Registry_Number? --Project Osprey (talk) 23:55, 27 November 2016 (UTC)

See Category:Chemical_articles_without_CAS_registry_number. The category has been renamed to follow Wikidata writing (note the lowercase r, n). The overview is in Category:CAS registry number tracking categories.
Also will change: Category:Chembox maintenance categoriesCategory:Chembox tracking categories
Also will change: Category:Infobox drug maintenance categoriesCategory:Infobox drug tracking categories
Before we can make {{Chembox}} and {{Drugbox}} a fullblown Wikidata machine, I needed to clean up some details. Call me if something is wrong. -DePiep (talk) 00:20, 28 November 2016 (UTC)
Will do. Many thanks --Project Osprey (talk) 00:52, 28 November 2016 (UTC)
Expect a Wikidata pilot shortly (using CAS registry number). Dozens of details to be researched (eg, what with {{Drugbox}}?). We want multiple tracking categories for a single data point, especially "Wikidata value =/= local value". Follow this page. -DePiep (talk) 01:13, 28 November 2016 (UTC)
"CAS registry number" is spelled "CAS Registry Number". So should the category. Christian75 (talk) 05:30, 28 November 2016 (UTC)
As explained, Wikidata wrote in lowercase (at the time of decision). -DePiep (talk) 13:29, 28 November 2016 (UTC)

Proposal: add label for indexed identifiers[edit]

Indexed identifiers

  • (index can be:
  • <blank>, 1, 2, 3, 4, 5)
  • CASNo
  • ChEBI
  • ChEMBL
  • ChemSpiderID
  • DrugBank
  • InChI, InChIKey
  • IUPHAR_ligand
  • KEGG
  • PubChem
  • SMILES, Jmol
  • UNII

In {{Chembox}}, we have some 13 indexed identifiers (see box right). Using these, the {{Chembox}} can handle up to six different compounds (by entering values for CASNo=, CASNo1=, CASNo2= etc., see example linalool below). At the moment, over 500 {{Chembox}}es use more that one CAS number.

Currently, the different numbers are specified by an extra comment=-input for each id.

I propose to add these six parameters to the {{Chembox}}:

|index_label=
|index1_label=
|index2_label=
|index3_label=
|index4_label=
|index5_label=

This will happen: when used, the same label will be added before each of the values with that index. (so: index2_label will precede CASNo2, and PubChem2, and ...2, values). This simplifies this sub-identification, and it stimulates that the editor can align the indexed values (i.e, make sure that same-index == same-substance. Or: index2_label == CASNo2 == PubChem2 etc.).

Example:


|index1_label=(R)
|index2_label=(S)
|CASNo1_Comment = (R)
|PubChem1_Comment = (R)
|CASNo2_Comment = (S)
|PubChem2_Comment = (S)
Linalool (sandbox)
Identifiers
Wikidata
2469
PubChem 6549
(R): 443158
(S): 67179
Linalool (live version)
Identifiers
78-70-6 YesY
126-91-0 (R) N
126-90-9 (S) N
2469
PubChem 6549
443158 (R)
67179 (S)

Minor notes: using these labels is also more simple, compared to having to add a separate comment for each data row. The comment input options may be less needed. And above all: this aligning the indexed identifiers is a good preparation for the Wikidata changes to come. It allows systematic data loading from Wikidata -- later more. More demos are in /testcases5.

Any comments, or support right away? -DePiep (talk) 21:38, 30 November 2016 (UTC)

Funny. I am also testing Wikidata things, so the demo looks strange. For now and this: just look at the (R) and (S) texts plese. -DePiep (talk) 21:03, 2 December 2016 (UTC)

Major proposal: add Wikidata *external link* to the Identifiers[edit]

This Wikidata topic is for {Chembox} and {Drugbox} together. Each and every data row will be treated alike. (Bear with me). -DePiep (talk) 21:55, 1 December 2016 (UTC)

I propose to add data row "Wikidata: [wikidata item external link]" to {{Chembox}} and {{Drugbox}}, section Identifiers. By default, the link should be provided automatically. Example would be: Carbon monoxided:Q2025

In practice, nearly every {Chem/Drug infobox} will show this link. Articles will be tracked (categorized) usefully. Later on, we will use that Wikidata link for properties, like PubChem numbers. -DePiep (talk) 21:55, 1 December 2016 (UTC)

I would propose not to show d:, i.e. Q2025. --Leyo 23:34, 1 December 2016 (UTC)
Good. If this is the only note ... ;-). (Just think of my elation this morning when I woke up thinking not to use WD for say property CAS number, but as an external link itself! Made my day today). -DePiep (talk) 23:58, 1 December 2016 (UTC)

Searching for duplicated pages[edit]

With ~16k compounds listed in chembox and drugbox I've sometimes wondered if there are duplicated pages. Obscure compounds could be named all sorts of things, particularly things like Category:Substituted_amphetamines and other designer drugs, where many people are trying to access every possible simple analogue but where indexing is understandably poor. I presume the best way to do this would be to scan all of the identifiers (CAS, Pubchem, etc) for duplications. There's not previously been a tool for doing that, and building one seems like a pretty poor use of time as it might not find anything. Now that these values have been moved to wikidata would searching for duplications be any easier? --Project Osprey (talk) 23:38, 5 December 2016 (UTC)

I'm working to (first) get the Wikidata link in the infoboxes, and the CAS number from WD. PubChem is next (because higly present in WD). Drugbox in tandem. Need some smart categorisation (cat:no Wikidata item/value, cat:enwiki local input differs from Wikidata value, etc.). These categories are a maintenance job (why no item?, why different CAS values?, PubChem values?). Numbers of pages in those cat's, I can not guess. I also expect more systematic issues: Widata truly wants: one compound=one item, nicotine seems to have two CAS numbers. These might need Wikidata involvement. Also, we have 500+ articles with multiple compounds (indexed CAS numbers), to handle & categorise re Wikidata.
But that does not address your question. For now, I cannot even think of a smart query setup. May I suggest: within days/weeks, there will be enough articles listed to walk through to check re Wikidata, by these other categories. Duid you have any plans for the holydays? -DePiep (talk) 23:58, 5 December 2016 (UTC)
A brute-force approach is use a database dump. One can pre-select a certain category if there were to be one that tracked all pages using chembox, but even the raw whole WP would be usable. If we have actual values in the WP chembox templates, it's easy to parse out a list of [pagetitle]->[chembox_casno] pairs and then look for dups in the [chembox_casno] values. If the values are imported from WD, then either it needs to be done on a WD export (and I don't know how to parse that!) and also cross-checked against any that are still hardcoded in WP. But it might be possible to dump the rendered pages or something like that, with the WD values pre-imported to WP. DMacks (talk) 00:06, 6 December 2016 (UTC)
That does sound like a truly awful way to spend Christmas... If a convenient tool doesn't exist now I'm sure it will later. Any large database needs a way of detecting duplicated entries. --Project Osprey (talk) 00:12, 6 December 2016 (UTC)
(ec) Still, wouldn't it be better to have a primary cleanup first: remove/fix all erroneous CAS numbers (categorised as: conflicting local:CASNo versus d:CASnumber)? -DePiep (talk) 00:16, 6 December 2016 (UTC)

There are Parnaparin sodium vs. Bemiparin sodium or Tinzaparin sodium vs. Semuloparin sodium vs. Dalteparin sodium sharing the same CAS RN. --Leyo 08:52, 6 December 2016 (UTC)

You could make a container category Category:Chemical duplicates by CAS-number, categorise all by [[:Category:<CAS number>]], and have a bot fill all the [[:Category:<CAS number>]] pages with '{{hidden category}} Category:Chemical duplicates by CAS-number'. One could then just browse through the pages of Category:Chemical duplicates by CAS-number, see all the subcategories, and each subcategory would tell how many pages are in there. All that have more than one are suspect. It is however going to be an awful tree with no further use. The only other thing I can think about is to have a script parse out all the CAS-numbers, toss them in a .csv with the pagename, load the .csv into a spreadsheet program and sort them by CAS-number. Add a column with "=if(A2=A1,'duplicate',)" in each cell, and see which ones show up. The former solution is easy to implement before Christmas .. the latter is indeed an awful way to spend Christmas .. --Dirk Beetstra T C
The latter solution should by the way be done on wikidata-data, making a listing like I said. That could then also nicely include all pages on other wikis, so one also kills the other bird: Chemicals which exist elsewhere but not here locally (but could be ported). --Dirk Beetstra T C 10:32, 6 December 2016 (UTC)
I won't spend any time on this, I promise, provided that this excercise does not interfere with the developments I am working om. Especially the sandbox stacks and |CASNo= param handling are off-limits. Deal? -DePiep (talk) 11:54, 6 December 2016 (UTC)
Face-smile.svg We only might ask you to categorise chem/drugboxes by CAS-number by adding a little piece of code (well, it is already there, just the other half of the 'IF there is no CASNo THEN categorise as a compound without CASNo') .. I will not touch it myself either (without consultation). --Dirk Beetstra T C 13:04, 6 December 2016 (UTC)
Already today there is Category:CAS registry number tracking categories. (Today there is no Wikidata check at all).
And as I wrote a few posts earlier, within days/weeks the first WD + CASNo tracking cats will go live. Any expansion of the logic is a headache and bad development practice. I see no need to speed up this particular dups-check when the primary check (align d:CASnumber with local CASnumber) is about to be available. -DePiep (talk) 13:23, 6 December 2016 (UTC)
  • Meanwhile, over at Wikidata: this. -DePiep (talk) 15:18, 8 December 2016 (UTC)
    I was going to suggest such, but the constraint vios cannot be filtered to en.wp-specific content ([un]fortunately). --Izno (talk) 15:20, 8 December 2016 (UTC)

Tracking category "mass overwritten" dropped[edit]

Tracking Category:Chemical articles having calculated molecular weight overwritten has been removed from the populating infoboxes {Chembox}, {Drugbox}. Empty now, and will not be populated again. The original intention was: compare entered mass (|MolarMass=) with calculated mass. Well, that did not work. -DePiep (talk) 22:26, 10 December 2016 (UTC)

Added parameter | Drug_class to pharmacology section[edit]

Similar to Infobox drug, I have addded |Drug_class= to section {{Chembox Pharmacology}}. It shows in top, right above the ATC code. Also, I have moved the Legal_status data to the bottom of that section, outside of medical (clinical etc. ) data. -DePiep (talk) 11:13, 17 December 2016 (UTC)