Chemical similarity (or molecular similarity) refers to the similarity of chemical elements, molecules or chemical compounds with respect to either structural or functional qualities, i.e. the effect that the chemical compound has on reaction partners in inorganic or biological settings. Biological effects and thus also similarity of effects are usually quantified using the biological activity of a compound. In general terms, function can be related to the chemical activity of compounds (among others).
The notion of chemical similarity (or molecular similarity) is one of the most important concepts in chemoinformatics. It plays an important role in modern approaches to predicting the properties of chemical compounds, designing chemicals with a predefined set of properties and, especially, in conducting drug design studies by screening large databases containing structures of available (or potentially available) chemicals. These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties.
Chemical similarity is often described as an inverse of a measure of distance in descriptor space. Examples for inverse distance measures are molecule kernels, that measure the structural similarity of chemical compounds.
Similarity search and virtual screening
The similarity-based virtual screening (a kind of ligand-based virtual screening) assumes that all compounds in a database that are similar to a query compound have similar biological activity. Although this hypothesis is not always valid, quite often the set of retrieved compounds is considerably enriched with actives. To achieve high efficacy of similarity-based screening of databases containing millions of compounds, molecular structures are usually represented by molecular screens (structural keys) or by fixed-size or variable-size molecular fingerprints. Molecular screens and fingerprints can contain both 2D- and 3D-information. However, the 2D-fingerprints, which are a kind of binary fragment descriptors, dominate in this area. Fragment-based structural keys, like MDL keys, are sufficiently good for handling small and medium-sized chemical databases, whereas processing of large databases is performed with fingerprints having much higher information density. Fragment-based Daylight, BCI, and UNITY 2D (Tripos) fingerprints are the best known examples. The most popular similarity measure for comparing chemical structures represented by means of fingerprints is the Tanimoto (or Jaccard) coefficient T. Two structures are usually considered similar if T > 0.85 (for Daylight fingerprints). However, it is a common misunderstanding that a similarity of T > 0.85 reflects similar bioactivities in general ("the 0.85 myth").
Chemical similarity network
The concept of chemical similarity can be expanded to consider chemical similarity network theory, where descriptive network properties and graph theory can be applied to analyze large chemical space, estimate chemical diversity and predict drug target. Recently, 3D chemical similarity networks based on 3D ligand conformation have also been developed, which can be used to identify scaffold hopping ligands.
Chemical semantic similarity
Semantic similarity compares chemical compounds based on their semantic characterizations. Compounds are compared by their role in nature, and not only by their structure. Semantic similarity collects these characterizations from ontologies, such as ChEBI. Chemical semantic similarity measures has shown to be a feasible and effective approach to improve existing chemical compound classification systems.
- Johnson, A. M.; Maggiora, G. M. (1990). Concepts and Applications of Molecular Similarity. New York: John Willey & Sons. ISBN 0-471-62175-7.
- N. Nikolova; J. Jaworska (2003). "Approaches to Measure Chemical Similarity - a Review". QSAR & Combinatorial Science. 22 (9-10): 1006–1026. doi:10.1002/qsar.200330831.
- Ralaivola, Liva; Swamidass, Sanjay J.; Hiroto, Saigo; Baldi, Pierre (2005). "Graph kernels for chemical informatics". Neural Networks. 18: 1093–1110. doi:10.1016/j.neunet.2005.07.009.
- Rahman, S. A.; Bashton, M.; Holliday, G. L.; Schrader, R.; Thornton, J. M. (2009). "Small Molecule Subgraph Detector (SMSD) toolkit". Journal of Cheminformatics. 1 (12). doi:10.1186/1758-2946-1-12.
- Kubinyi, H. (1998). "Similarity and Dissimilarity: A Medicinal Chemist's View". Perspectives in Drug Discovery and Design. 9–11: 225–252. doi:10.1023/A:1027221424359.
- Martin, Y. C.; Kofron, J. L.; Traphagen, L. M. (2002). "Do structurally similar molecules have similar biological activity?". J. Med. Chem. 45 (19): 4350–4358. doi:10.1021/jm020155c. PMID 12213076.
- Durant, J. L.; Leland, B. A.; Henry, D. R.; Nourse, J. G. (2002). "Reoptimization of MDL Keys for Use in Drug Discovery". J. Chem. Inf. Comput. Sci. 42 (6): 1273–1280. doi:10.1021/ci010132r. PMID 12444722.
- "Daylight Chemical Information Systems Inc".
- "Barnard Chemical Information Ltd". Archived from the original on 2008-10-11.
- "Tripos Inc".
- Maggiora, G.; Vogt, M.; Stumpfe, D.; Bajorath, J. (2014). "Molecular Similarity in Medicinal Chemistry". J. Med. Chem. 57 (8): 3186–3204. doi:10.1021/jm401411z. PMID 24151987.
- Ferreira, João D.; Couto, Francisco M. (2010-09-23). "Semantic Similarity for Automatic Classification of Chemical Compounds". PLOS Computational Biology. 6 (9): e1000937. Bibcode:2010PLSCB...6E0937F. doi:10.1371/journal.pcbi.1000937. ISSN 1553-7358. PMC . PMID 20885779.
- Ferreira, João D.; Hastings, Janna; Couto, Francisco M. (2013-11-01). "Exploiting disjointness axioms to improve semantic similarity measures". Bioinformatics. 29 (21): 2781–2787. doi:10.1093/bioinformatics/btt491. ISSN 1367-4811. PMID 24002110.
- Small Molecule Subgraph Detector (SMSD)— a Java-based software library for calculating Maximum Common Subgraph (MCS) between small molecules. This enables us to find similarity/distance between molecules. MCS is also used for screening drug like compounds by hitting molecules, which share common subgraph (substructure).
- Kernel-based Similarity for Clustering, regression and QSAR Modeling
- Brutus— a similarity analysis tool based on molecular interaction fields.