Wikipedia:WikiProject Chemistry/CAS validation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

NOTE: Chemical Abstracts have agreed to perform this work for us. Information on this process will be posted once details are available.

This topic is the main agenda item at the February 5th IRC Meeting. Please join us!

There are two types of validation which may be done with CAS numbers. One is a mathematical validation, designed to detect mistyped CAS numbers. The other method is to validate that the number is assigned to a chemical.

Mathematical validation (using a check digit):[edit]

The Chemical Abstracts Service (CAS) registry number system was designed to be fault-tolerant. Built into every CAS number is a check-digit that makes it possible to detect mis-typed numbers. Validation is a mathematical and repetitive process well-suited for software. Note that a validated CAS number can still be absent from the CAS database; mathematical validation only says that a CAS number could be valid based on its format.[1]

Here is sample code for this validation: module CAS

 def validate cas_number
   return false unless cas_number && cas_number.match(/[0-9]{2,7}-[0-9]{2}-[0-9]/)
   check_digit = cas_number[-1,1].to_i
   sum = 0
   cas_number.reverse.scan(/[0-9]/).each_with_index do |digit, i|
     sum = sum + digit.to_i * i
   end
   check_digit == sum.remainder(10)
 end

end

include CAS

while true do

 print "CAS Number: "
 cas_number = gets.strip
 break if cas_number.empty?
 puts CAS.validate(cas_number) ? "valid" : "invalid"

end

Validation by lookup:[edit]

CAS numbers need to be validated for the ~4000 chemical pages. Since the only authoritative source is the American Chemical Society, SciFinder looks like the best bet. For various reasons (see previous IRC discussions), it is not practical for one editor to validate them all. Thus, the divison of labor:

ChemSpiderMan (talk · contribs) will be in charge of the distribution. Help is wanted! To contribute, simply request the block (number of entries) you would like to handle, and sign by using ~~~~ after you are done. It may be helpful to try tackling a smaller block, before making further requests.

Visit the authority for CAS numbers and use either Scifinder or STN to search/validate the CAS number for the represented structure. There ARE multiple CAS numbers associated with a single compound so the CAS number itself might need to be annotated. For most complex organics I don't think this will be a problem but will be for the inorganics and a number of the more common organics. --ChemSpiderMan (talk) 05:15, 23 January 2008 (UTC)

CAS Number Legality[edit]

Please see the Ruby code posted by Rich Apodaca at http://depth-first.com/articles/2008/07/23/validating-cas-numbers[1]. This code demonstrates how the check digit (the last digit) is calculated.

As an example, Caffeine is 58-08-2. 2 was calculated by (8*1) + (0*2) + (8*3) + (5*4) = 8 + 0 + 24 + 20 = 52, then taking modulo 10 of the result.

Now, this validates that this numerical sequence (1-7 digits) - (2 digits) - (1 digit) is legal to use as a CAS number, but doesn't validate that it is in use in the CAS Registry. --Underscore bruce (talk) 18:07, 18 September 2008 (UTC)

Pending requests[edit]

Allocated blocks[edit]

  1. 50 entries, to get a feel for it. --Rifleman 82 (talk) 17:29, 22 January 2008 (UTC)
    1. 50 entries have been uploaded for you here: http://www.chemspider.com/docs/wikipedia/Structures_1_to_25.pdf
  2. 50 entries for me, as a trial, I'll see if I can get the access somehow. Walkerma (talk) 20:49, 22 January 2008 (UTC) Note that I may not get to these for a couple of weeks, it will involve a special trip to another college. But I want to see what's involved. Walkerma (talk) 07:43, 26 January 2008 (UTC)
    1. http://www.chemspider.com/docs/wikipedia/Structures_26_to_50.pdf

      Let's try this format...if it works we will stick with it. I will not generate anymore until you've moved through these...but no pressure on you. I will then generate in blocks of 25. --ChemSpiderMan (talk) 00:49, 24 January 2008 (UTC)

  1. 150 entries for validation. I will put the project on hold until we have worked our way through these. All are here:

Enjoy.--ChemSpiderMan (talk) 03:38, 4 February 2008 (UTC)

I have now updated the links above so you can access the files without passwords.--ChemSpiderMan (talk) 01:20, 6 February 2008 (UTC)

References[edit]

See also[edit]