= Aromaticity (cheminformatics) =

Aromaticity detection in cheminformatics refers to computational algorithms and models used to identify aromatic ring systems in molecular graphs. Unlike the chemical concept of aromaticity, which describes the special stability of certain cyclic conjugated systems, computational aromaticity is primarily a nomenclature and data representation concern. There is no single universally accepted aromaticity model in cheminformatics, and different software toolkits implement different algorithms, leading to inconsistent results for the same molecular structure.

== Background ==

=== Purpose in cheminformatics ===
In cheminformatics, aromaticity perception serves several practical purposes:

1. Canonical representation: Aromatic notation allows a single representation for molecules that could otherwise be drawn with different Kekulé forms. For example, benzene could be drawn with alternating single and double bonds starting from different positions, yielding different connection tables despite representing the same molecule.
2. Compact notation: In SMILES notation, aromatic atoms are represented with lowercase letters (e.g., for benzene versus for the Kekulé form), providing a more compact representation.
3. Substructure searching: Aromaticity flags facilitate pattern matching in chemical databases, though inconsistent aromaticity perception between toolkits can lead to missed or incorrect matches.
4. Force field typing: Molecular mechanics force fields such as MMFF94 have their own aromaticity models for atom typing purposes.

=== Relationship to chemical aromaticity ===
The computational definition of aromaticity differs substantially from the chemical concept. As David Weininger, the creator of SMILES, noted: "There is no single rigorous definition of aromaticity in chemistry." To a synthetic chemist, aromaticity implies something about reactivity; to a thermodynamicist, about heat of formation; to a spectroscopist, about NMR ring current; to a molecular modeler, about geometrical planarity.

Computational aromaticity models are designed to be unambiguous and computable, not to capture all aspects of chemical aromaticity. Most are based on Hückel's rule (the 4n+2 rule), which states that planar cyclic conjugated systems with 4n+2 π electrons exhibit special stability.

== Algorithm components ==
Aromaticity detection typically involves two main components:

=== Cycle perception ===
Algorithms must first identify the rings (cycles) in a molecular graph to evaluate for aromaticity. Several approaches exist:

- Smallest Set of Smallest Rings (SSSR): The earliest widely used cycle set, defined as a minimum cycle basis. However, SSSR is not unique for many structures (different valid SSSRs can be found for the same molecule), leading to non-deterministic results. OpenEye has argued that "SSSR Considered Harmful" and deliberately excludes it from their toolkit.
- Relevant cycles: The union of all minimum cycle bases, which is unique but can become exponentially large for complex structures.
- Essential cycles: The intersection of all minimum cycle bases, which is unique but may not form a basis.
- All cycles: The complete set of elementary cycles, which can be exponential in size for fused ring systems like fullerenes.
- Unique Ring Families (URF): A more recent approach developed by researchers at Universität Hamburg, providing a unique, polynomial-time, chemically meaningful description of ring topology.

=== Electron donation models ===
After identifying cycles, algorithms determine how many π electrons each atom contributes. Common rules include:

- sp² carbon: Contributes 1 electron
- Heteroatoms with lone pairs (O, S, N in pyrrole-type position): Contribute 2 electrons
- Pyridine-type nitrogen: Contributes 1 electron
- Exocyclic double bonds: May "consume" electrons from the ring (model-dependent)

If the total π electron count for a cycle equals 4n+2 (where n is a non-negative integer), the cycle is considered aromatic.

== Aromaticity models by toolkit ==
Different cheminformatics toolkits implement different aromaticity models, often providing multiple options:

=== Chemistry Development Kit (CDK) ===
The Chemistry Development Kit provides a highly configurable aromaticity system combining electron donation models with cycle finders:

Electron donation models:
- CDK model: Requires atom type perception; exocyclic π-bonds not allowed
- CDK allowing exocyclic: Same as CDK model but permits exocyclic double bonds
- Daylight model: Follows the Daylight/OpenSMILES specification
- Pi bonds model: Only atoms adjacent to cyclic π-bonds contribute (MDL-like)

Cycle finders:
- MCB/SSSR
- Relevant cycles
- Essential cycles
- All cycles
- CDK aromatic set (MCB plus envelope rings for small fused systems)

=== RDKit ===
RDKit provides multiple aromaticity models:
- AROMATICITY_RDKIT (default): Rule-based, follows 4n+2 with consideration of fused systems
- AROMATICITY_SIMPLE: Restricts perception to 5- and 6-membered rings
- AROMATICITY_MDL: Follows the MDL/BIOVIA aromaticity definition
- AROMATICITY_MMFF94: Uses MMFF force field aromaticity rules
- AROMATICITY_CUSTOM: Allows user-defined aromaticity functions

Aromaticity perception is limited to fused-ring systems where all members are at most 24 atoms in size for computational efficiency.

=== OpenEye OEChem ===
OpenEye OEChem TK supports five aromaticity models:
- OpenEye (default)
- Daylight
- Tripos
- MDL
- MMFF

These models differ significantly in their treatment of heteroatoms, exocyclic bonds, and unusual ring systems. OpenEye uses Kekulization verification rather than strict Hückel evaluation, allowing preservation of user-specified aromaticity from input files.

=== Open Babel ===
Open Babel implements a single aromaticity model close to the Daylight definition. Aromaticity perception is performed via the OBAromaticTyper class using pattern-based rules. The toolkit re-perceives aromaticity when writing SMILES to ensure consistent output regardless of input aromaticity annotations.

=== Indigo ===
Indigo supports two aromaticity models:
- Basic model: External double bonds for aromatic rings are not allowed
- Generic model: External double bonds are allowed

== Challenges and limitations ==

=== Planarity ===
Most computational aromaticity models do not explicitly check for molecular planarity, despite it being a requirement of Hückel's rule. Cyclooctatetraene and other non-planar systems may be incorrectly flagged as aromatic by some implementations.

=== Fused ring systems ===
Hückel's rule was derived for monocyclic systems. Polycyclic systems like azulene (which has a 10-membered aromatic envelope) or naphthalene present special challenges. Different toolkits handle these differently:
- Some check only individual rings
- Some check envelope rings (rings formed by the fusion boundary)
- Some check all possible ring combinations

=== Tautomerism ===
Aromaticity perception typically does not account for tautomeric forms, which may affect electron donation patterns.

=== Antiaromaticity ===
Systems with 4n π electrons (e.g., cyclobutadiene) are antiaromatic and destabilized. Most cheminformatics aromaticity models do not explicitly handle antiaromaticity, though they correctly identify such systems as non-aromatic.

== Standards efforts ==

=== OpenSMILES ===
The OpenSMILES specification (2007) attempted to standardize aromaticity handling in SMILES:

However, the specification acknowledges ambiguities and leaves implementation details to individual toolkits.

=== IUPAC SMILES+ ===
IUPAC has undertaken an effort to develop SMILES+ as a more formal specification. The working draft largely follows OpenSMILES but aims to resolve remaining ambiguities.

== See also ==

- Aromaticity
- Hückel's rule
- SMILES
- Cheminformatics
- Molecular graph
- Ring (chemistry)
