Chemical similarity: Difference between revisions

Content deleted Content added

Inline

Revision as of 19:33, 29 July 2010

Chemical similarity (or molecular similarity) refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in anorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds (among others).

The notion of chemical similarity (or molecular similarity) is one of the most important concepts in chemoinformatics ^[1]^[2]. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties^[1].

Similarity Measures

Chemical similarity is often described as an inverse of a measure of distance in descriptor space. Distance measures can be classified into Euclidean measures and non-Euclidean measures depending on whether the triangle inequality holds.

Similarity Search and Virtual Screening

The similarity-based ^[3] virtual screening (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid^[4], quite often the set of retrieved compounds is considerably enriched with actives^[5]. To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys^[6], are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight^[7], BCI^[8], and UNITY 2D (Tripos^[9]) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient T. Two structures are usually considered similar if $T>0.85$ (for Daylight fingerprints).

References

^ ^a ^b A. M. Johnson, G. M. Maggiora (1990). Concepts and Applications of Molecular Similarity. New York: John Willey & Sons.
^ N. Nikolova, J. Jaworska (2003). "Approaches to Measure Chemical Similarity - a Review". QSAR & Combinatorial Science. 22 (9–10): 1006–1026. doi:10.1002/qsar.200330831.
^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. DOI:10.1186/1758-2946-1-12
^ H. Kubinyi (1998). "Similarity and Dissimilarity: A Medicinal Chemist's View". Persp. Drug Discov. Design. 9–11: 225–252. doi:10.1023/A:1027221424359.
^ Y. C. Martin, J. L. Kofron, L. M. Traphagen (2002). "Do structurally similar molecules have similar biological activity?". J. Med. Chem. 45 (19): 4350–4358. doi:10.1021/jm020155c. PMID 12213076.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse (2002). "Reoptimization of MDL Keys for Use in Drug Discovery". J. Chem. Inf. Comput. Sci. 42 (6): 1273–1280. PMID 12444722.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ "Daylight Chemical Information Systems Inc".
^ "Barnard Chemical Information Ltd".
^ "Tripos Inc".

External links

Small Molecule Subgraph Detector (SMSD) - is a Java based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This will help us to find similarity/distance between two molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).
Chemical Similarity (QSAR World)
Similarity Principle
Fingerprint-based Similarity used in QSAR Modeling
Brutus is a similarity analysis tool based on molecular interaction fields.

[johnson_1990-1] A. M. Johnson, G. M. Maggiora (1990). Concepts and Applications of Molecular Similarity. New York: John Willey & Sons.

[nikolova_2003-2] N. Nikolova, J. Jaworska (2003). "Approaches to Measure Chemical Similarity - a Review". QSAR & Combinatorial Science. 22 (9–10): 1006–1026. doi:10.1002/qsar.200330831.

[SMSD09-3] S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. DOI:10.1186/1758-2946-1-12

[kubinyi_1998-4] H. Kubinyi (1998). "Similarity and Dissimilarity: A Medicinal Chemist's View". Persp. Drug Discov. Design. 9–11: 225–252. doi:10.1023/A:1027221424359.

[5] Y. C. Martin, J. L. Kofron, L. M. Traphagen (2002). "Do structurally similar molecules have similar biological activity?". J. Med. Chem. 45 (19): 4350–4358. doi:10.1021/jm020155c. PMID 12213076.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[durant_2002-6] J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse (2002). "Reoptimization of MDL Keys for Use in Drug Discovery". J. Chem. Inf. Comput. Sci. 42 (6): 1273–1280. PMID 12444722.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[daylight-7] "Daylight Chemical Information Systems Inc".

[bci-8] "Barnard Chemical Information Ltd".

[tripos-9] "Tripos Inc".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

@@ Line 1: / Line 1: @@
 '''Chemical similarity''' (or '''molecular similarity''') refers to the similarity of [[chemical element]]s, [[molecule]]s or [[chemical compound]]s with respect to either [[chemical structure|structural]] or functional qualities, i.e. the effect that the chemical compound has on [[chemical reaction|reaction]] partners in anorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the [[biological activity]] of a compound. In general terms, function can be related to the [[chemical activity]] of compounds (among others).
-The notion of ''chemical similarity'' (or ''molecular similarity'') is one of the most important concepts in [[chemoinformatics]] <ref name="johnson_1990">{{cite book | author = A. M. Johnson, G. M. Maggiora | title = Concepts and Applications of Molecular Similarity | publisher = John Willey & Sons | location = New York | date = 1990}}</ref><ref name="nikolova_2003">{{cite journal | author = N. Nikolova, J. Jaworska | title = Approaches to Measure Chemical Similarity - a Review | journal = QSAR & Combinatorial Science | year = 2003 | volume = 22 | issue = 9-10 | pages = 1006–1026}}</ref>. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: ''similar compounds have similar properties''<ref name="johnson_1990"/>.
+The notion of ''chemical similarity'' (or ''molecular similarity'') is one of the most important concepts in [[chemoinformatics]] <ref name="johnson_1990">{{cite book | author = A. M. Johnson, G. M. Maggiora | title = Concepts and Applications of Molecular Similarity | publisher = John Willey & Sons | location = New York | date = 1990}}</ref><ref name="nikolova_2003">{{cite journal | doi = 10.1002/qsar.200330831 | author = N. Nikolova, J. Jaworska | title = Approaches to Measure Chemical Similarity - a Review | journal = QSAR & Combinatorial Science | year = 2003 | volume = 22 | issue = 9-10 | pages = 1006–1026}}</ref>. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: ''similar compounds have similar properties''<ref name="johnson_1990"/>.
 == Similarity Measures ==
@@ Line 7: / Line 7: @@
 == Similarity Search and Virtual Screening ==
-The similarity-based <ref name="SMSD09">S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12.  DOI:10.1186/1758-2946-1-12</ref> [[virtual screening]] (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid<ref name="kubinyi_1998">{{cite journal | author = H. Kubinyi | title = Similarity and Dissimilarity: A Medicinal Chemist’s View | journal = Persp. Drug Discov. Design | year = 1998 | volume = 9-11 | pages = 225–252}}</ref>, quite often the set of retrieved compounds is considerably enriched with actives<ref = "martin_2002">{{cite journal | author = Y. C. Martin, J. L. Kofron, L. M. Traphagen | title = Do structurally similar molecules have similar biological activity? | journal = J. Med. Chem. | volume = 45 | issue = 19 | pages = 4350–4358 | pmid = 12213076 | year = 2002}}</ref>. To  achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by ''molecular screens'' (structural keys) or by fixed-size or variable-size ''molecular fingerprints''. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys<ref name="durant_2002">{{cite journal | author = J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse | title = Reoptimization of MDL Keys for Use in Drug Discovery | journal = J. Chem. Inf. Comput. Sci. | year = 2002 | volume = 42 | issue = 6 | pages = 1273–1280 | pmid = 12444722}}</ref>, are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight<ref name="daylight">{{cite web | title = Daylight Chemical Information Systems Inc. | url = http://www.daylight.com}}</ref>, BCI<ref name="bci">{{cite web | title = Barnard Chemical Information Ltd. | url = http://www.bci.gb.com/}}</ref>, and UNITY 2D (Tripos<ref name="tripos">{{cite web | title = Tripos Inc. | url = http://www.tripos.com}}</ref>) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the [[Jaccard index|Tanimoto (or Jaccard) coefficient]] ''T''. Two structures are usually considered similar if <math>T > 0.85</math> (for Daylight fingerprints).
+The similarity-based <ref name="SMSD09">S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12.  DOI:10.1186/1758-2946-1-12</ref> [[virtual screening]] (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid<ref name="kubinyi_1998">{{cite journal | doi = 10.1023/A:1027221424359 | author = H. Kubinyi | title = Similarity and Dissimilarity: A Medicinal Chemist’s View | journal = Persp. Drug Discov. Design | year = 1998 | volume = 9-11 | pages = 225–252}}</ref>, quite often the set of retrieved compounds is considerably enriched with actives<ref = "martin_2002">{{cite journal | doi = 10.1021/jm020155c | author = Y. C. Martin, J. L. Kofron, L. M. Traphagen | title = Do structurally similar molecules have similar biological activity? | journal = J. Med. Chem. | volume = 45 | issue = 19 | pages = 4350–4358 | pmid = 12213076 | year = 2002}}</ref>. To  achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by ''molecular screens'' (structural keys) or by fixed-size or variable-size ''molecular fingerprints''. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys<ref name="durant_2002">{{cite journal | author = J. L. Durant, B. A. Leland, D. R. Henry, J. G. Nourse | title = Reoptimization of MDL Keys for Use in Drug Discovery | journal = J. Chem. Inf. Comput. Sci. | year = 2002 | volume = 42 | issue = 6 | pages = 1273–1280 | pmid = 12444722}}</ref>, are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight<ref name="daylight">{{cite web | title = Daylight Chemical Information Systems Inc. | url = http://www.daylight.com}}</ref>, BCI<ref name="bci">{{cite web | title = Barnard Chemical Information Ltd. | url = http://www.bci.gb.com/}}</ref>, and UNITY 2D (Tripos<ref name="tripos">{{cite web | title = Tripos Inc. | url = http://www.tripos.com}}</ref>) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the [[Jaccard index|Tanimoto (or Jaccard) coefficient]] ''T''. Two structures are usually considered similar if <math>T > 0.85</math> (for Daylight fingerprints).
 ==References==