Cambridge Structural Database

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Cambridge Structural Database
Database.png
Content
Description
Contact
Research center Cambridge Crystallographic Data Centre
Access
Data format .cif
Website
Web service URL webcsd.ccdc.cam.ac.uk
Tools
Web WebCSD
Standalone
  • CSD System
  • PreQuest
  • CSD (the database)
  • ConQuest
  • Mercury
  • IsoStar
  • Mogul
Miscellaneous

The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of molecules generally containing at least carbon and hydrogen, comprising a wide range of organic, metal-organic and organometallic molecules. The specific entries are complementary to the other crystallographic databases such as the PDB, ICSD and PDF. The data, typically obtained by X-ray crystallography and less frequently by neutron diffraction, and submitted by crystallographers and chemists from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC,Repository[1]). The CSD is overseen by the not-for-profit incorporated company called the Cambridge Crystallographic Data Centre, CCDC.

The CSD is widely recognized as the world’s repository for small-molecule organic and metal-organic crystal structures, and has become an essential resource to scientists around the world. Structures deposited with CCDC are made publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the Cambridge Structural Database (CSD) which underpins a range of software solutions offered by CCDC. Targeted subsets of the CSD are also freely available to support teaching and other activities.[2]

History[edit]

The CCDC grew out of the activities of the crystallography group led by Dr Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).

The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK [[Science and Engineering Research Council]]. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the USA, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries worldwide. As of 2014, the CSD System is now distributed to academics in 70 countries worldwide.

During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the CCDC as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international Board of Governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.

Dr Kennard retired as Director in 1997 and was succeeded by Dr David Hartley (1997-2002) and Dr Frank Allen (2002-2008). Dr Colin Groom was appointed as Executive Director from 1 October 2008.

CCDC software products have now diversified to make maximum use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is now carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.

Although the CCDC is now a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).

The CCDC established US applications and support operations in the USA in October 2013 at Rutgers, the State University of New Jersey, where it is co-located with the RCSB Protein Data Bank

Contents[edit]

The CSD is continually updated with new structures (>50,000 new structures each year)[3] and with improvements to existing entries. Entries (structures) in the repository are released for public access shortly after the corresponding entry has appeared in the peer-reviewed scientific literature.

Periodically, general statistics about the breadth of CSD holdings are reported.[4] As of 6 January 2014, the summary statistics are as follows:

Query structures % of CSD
Total # of structures 686,944 100.0
# of different compounds 628,684 -
# of literature sources 1,578 -
Organic structures 292,661 42.6
Transition metal present 369,682 53.8
alkali or alkaline earth metal present 34,433 5.0
main group metal present 41,711 6.1
3D coordinates present 643,032 93.3
Error-free coordinates 630,329 98.0
Single crystal X-ray studies 682,398 99.4
Neutron studies 1,616 0.2
Powder diffraction studies 2,930 0.4
Low/high temp. studies 306,809 44.7
Absolute configuration determined 14,752 2.1
Disorder present in structure 158,127 23.0
Polymorphic structures 20,753 3.0
R-factor < 0.100 645,809 94.0
R-factor < 0.075 585,333 85.2
R-factor < 0.050 378,391 55.1
R-factor < 0.030 78,594 11.4
No. of atoms with 3D coordinates 53,563,990 -

As of 6 January 2014, the top 25 scientific journals in terms of publication of structures in the CSD repository were:

1. 55,119 structures were reported in Inorg.Chem.
2. 41,850 structures were reported in Dalton & J.Chem.Soc.,Dalton Trans.
3. 41,761 structures were reported in Organometallics
4. 38,148 structures were reported in Acta Crystallogr.,Sect.E
5. 35,943 structures were reported in J.Am.Chem.Soc.
6. 25,062 structures were reported in J.Organomet.Chem.
7. 24,676 structures were reported in Acta Crystallogr.,Sect.C
8. 22,552 structures were reported in Inorg.Chim.Acta
9. 21,663 structures were reported in Chem.Commun. & J.Chem.Soc.
10. 18,454 structures were reported in Polyhedron
11. 18,089 structures were reported in Angew.Chem.,Int.Ed.
12. 17,303 structures were reported in Chem.-Eur.J.
13. 16,747 structures were reported in Eur.J.Inorg.Chem.
14. 16,134 structures were reported in J.Org.Chem.
15. 13,756 structures were reported in Acta Crystallogr.,Sect.B
16. 12,228 structures were reported in Z.Anorg.Allg.Chem.
17. 12,069 structures were reported in Cryst.Growth Des.
18. 11,614 structures were reported in CrystEngComm
19. 10,312 structures were reported in Tetrahedron
20. 8,995 structures were reported in Tetrahedron Lett.
21. 8,597 structures were reported as Private Communication to the CSD
22. 8,097 structures were reported in Organic Letters
23. 7,280 structures were reported in J.Mol.Struct.
24. 6,847 structures were reported in Z.Naturforsch.,B
25. 5,807 structures were reported in Eur.J.Org.Chem.

These 25 journals account for 499,109 of the 686,944 or 72.7% of the structures in the CSD.

These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.

The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.

Growth trend[edit]

Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009,[5][6] and is estimated to reach the 1,000,000 structures milestone in 2017.

Number of published structures per year
Year # published Total
2012 45199 661,121
2011 43882 615,922
2010 41240 572,040
2009 40627 530,800
2008 36802 490,173
2007 36569 453,371
2006 34713 416,802
2005 31733 382,089
2004 27988 350,356
2003 26287 322,368
2002 24306 296,081
2001 21781 271,775
2000 19998 249,994
1999 18780 229,996
1998 17289 211,216
1997 15896 193,927
1996 15487 178,031
1995 13001 162,544
1994 12290 149,543
1993 12032 137,253
1992 10691 125,221
1991 9941 114,530
1990 8935 104,589
1989 7750 95,654
1988 7644 87,904
1987 7472 80,260
1986 6873 72,788
1985 6911 65,915
1984 6511 59,004
1983 5250 52,493
1982 5233 47,243
1981 4666 42,010
1980 4252 37,344
1979 3876 33,092
1978 3415 29,216
1977 3092 25,801
1976 2735 22,709
1975 2171 19,974
1974 2142 17,803
1973 1991 15,661
1972 1969 13,670
1971 1548 11,701
1970 1261 10,153
1969 1130 8,892
1968 975 7,762
1967 936 6,787
1966 683 5,851
1965 656 5,168
1923-1964 4512 4,512

Note: data for 1923-1964 are aggregated together in the last line of the table.

File format[edit]

The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF.[7]

The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.

The CCDC uses two different codes to distinguish between the deposited CSD entry and the curated CSD entry. For example, one specific ‘Personal Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.

Viewing the data[edit]

The structure files may be viewed using one of several open source computer programs such as Jmol. Some other free, but not open source programs include MDL Chime, Pymol, UCSF Chimera, Rasmol, WINGX,[8] and the CCDC provides a free version of its molecule visualization program Mercury.

See also[edit]

References[edit]

  1. ^ "CCDC CIF Depository Request Form". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16. 
  2. ^ "CCDC Homepage". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16. 
  3. ^ Bruno, I. J.; Groom, C. R. (2014). "A crystallographic perspective on sharing data and knowledge". Journal of Computer-Aided Molecular Design. doi:10.1007/s10822-014-9780-9.  edit
  4. ^ "CSD Entries: Summary Statistics". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16. 
  5. ^ Groom, C. R.; Allen, F. H. (2014). "The Cambridge Structural Database in Retrospect and Prospect". Angewandte Chemie International Edition 53 (3): 662. doi:10.1002/anie.201306438. 
  6. ^ "Growth of the Cambridge Structural Database (CSD) since 1970.". CCDC. Retrieved 2014-09-16. 
  7. ^ Hall SR, Allen FH, Brown ID (1991). "The Crystallographic Information File (CIF): a new standard archive file for crystallography". Acta Crystallographica A47 (6): 655–685. doi:10.1107/S010876739101067X. 
  8. ^ Farrugia, Louis J. (1 August 1999). "WinGX suite for small-molecule single-crystal crystallography". Journal of Applied Crystallography 32 (4): 837–838. doi:10.1107/S0021889899006020. 

External links[edit]