Aromaticity in cheminformatics

One of the frustrating things about being an academically trained bench chemist and a creator of cheminformatics software is having awareness of the number of chemical concepts that got mistranslated when they were turned into software products. This is principally due to the fact that being a research scientist and being a software engineer are two separate professions, each with their “10,000 hours” proficiency requirements. For this reason, most of the robustly engineered cheminformatics software was built by professional programmers, after having the scientific concepts conveyed by professional chemists.

Problem is, some of the concepts didn’t get explained properly, and one of the important ones is aromaticity.

Like with a lot of organic chemistry, this particular concept seems to be very amenable to a fairly simple rule: the famous Hückel 2N+2 rule. Combine this with the very oversimplified notion that an atom/bond/ring is either aromatic or it isn’t, and we have a great pedagogical teaching tool, which will serve incredibly well for all manner of cases. The problem is, it’s not true. Professional chemists know this, but they don’t necessary realise that they need to explain it.

Aromaticity is a physical observable, and it can be measured or calculated in various ways, as a continuously variable property, e.g. on a scale of 0% to 100% relative to some reference. In that sense, it is analogous to any other measurable continuous physical property, e.g. water solubility. It is often quite useful to state that some chemical is either water soluble, or it isn’t. This can be tested in the laboratory easily enough: throw it into water, and if it disappear then it’s soluble, if it does nothing then it’s not. Sometimes it sort of does a bit of both, but for a large domain, this binary classification is quite reasonable. There are even some simple rules that can predict it based on structure, e.g. if there are a certain number of hydroxyl groups per carbon, then it’s pretty sure to be soluble. That’s not true either, but in many cases it’s a helpful rule. Chemistry is all about helpful rules, but they need to be paired with the wisdom to know when they apply.

Describing aromaticity as a boolean property, that is predicted by simple numeric counting rules and a bit of graph navigation, is the same thing as describing water solubility. And to make things more interesting, there are different definitions of aromaticity, which vary depending on the need.

Four of them spring to mind, and no doubt there are more:

  1. Symmetry. Certain kinds of aromaticity (particularly the 6 membered ring kind) cause the conjugated single-double bonds to blur together, so that they are effectively of order 1.5. This means that the two resonance forms of 1,2-chlorobromobenzene are indistinguishable. Being able to mark the bonds as aromatic is essential for naming purposes, because for most practical purposes, the two forms are literally the same thing. If two chemists drew the two different forms, it would be a problem if other chemists (or software) thought they were two different molecules. For other kinds of aromatic rings, like furan or thiophene, this is not an issue, since there is one resonance form that everyone agrees is the most sensible, and so there is no need for naming disambiguation.
  2. Geometry. Aromatic ring systems have a very strong desire to be flat, which is not so much the case for ring systems which are merely conjugated.
  3. Ring current. Aromatic rings generate a ring current which exerts a field perpendicular to the plane of the ring. This field is known to be important for nonbonding interactions, which affects crystallisation and ligand-protein interactions.
  4. Reactivity. An aromatic ring has significantly different reaction properties; for example, electrophilic aromatic substitution reactions tend to replace a hydrogen atom, whereas for a merely conjugated ring system, the properties are typically more alkene-like, and tend to undergo addition reactions across the double bond.

Each of these properties can me measured or calculated in different ways, but what they all have in common is that describing their aromaticity as yes or no is a judgment call. It is not a fundamental property of the molecular representation, which is where certain cheminformatics file formats and algorithms go wrong.

It is quite valid to include aromaticity observations, calculations, expert opinions and judgment calls, but they need to be understood as just that. And ideally the provenance should be recorded. If a molecule has certain bonds marked as aromatic, the reason should be indicated: did it satisfy some 2N+2 electron counting rule? Did a scientist examine the molecule and render their expert opinion that this is so? Was a measurement of the aromatic ring current or reactivity made, such that it passed a certain threshold?

As it pertains to cheminformatics, the most commonly needed use for aromaticity is naming and disambiguation. Exact structure matches, substructures, canonical representations, descriptor calculations and many other algorithms need a way to ensure that two equivalent resonance forms of the same species provide the same answer, which is desirable. But the concept that is needed is not necessarily aromaticity, and it’s unfortunate that the term gets used. What is needed is conjugational equivalence. The concept is required for even-sized aromatic rings like benzene derivatives; it is not required for odd-sized rings with lone-pair contributors like thiophene, even though it does have significant aromaticity; it is needed for odd-sized cationic rings like imidazolium ions. If there is no obvious way for scientists to localise the bonds consistently, then conjugational equivalence is a service that is required from cheminformatics software. Furthermore, it must be understood that this determination is made under certain assumptions, e.g. dissolved in aqueous solution at room temperature with most stable tautomer and lowest energy protonation state; or perhaps in the solid form, under the assumption that the atom connectivity is exactly as drawn. These assumptions must be specified, and recorded, if they are being stored within the structure representation. If they are being calculated on an as-needed basis, then they are documented implicitly.

For algorithms that are intended to calculate the experimental observations of aromaticity, such as geometry, ring current and nonbonding interactions, then a method that is more closely aligned to the real chemistry is needed. But such a method should probably not be returning yes or no, but rather a degree of aromaticity. It should be clearly labelled as a prediction, like weather forecasting: e.g. it will or will not snow tomorrow, if it is understood that if it just snows a little bit, that’s close enough to not snowing at all. And there is no reason why such a calculation would be strongly correlated with the conjugational equivalence that is needed for disambiguation, except in the obvious cases (e.g. benzene = definitely yes, cyclohexane = definitely no).

At the end of the day, the whole aromaticity thing is a complex and nuanced topic. It should be left out of the fundamental representations of chemical structures, which should at their core collect only the facts that the originating scientist is attempting to convey to the world. Upon this fundamental core should be layered whatever is needed to achieve the task at hand. But a trivial definition of aromaticity has no place in the first-tier properties of cheminformatics.

Advertisements

2 thoughts on “Aromaticity in cheminformatics

  1. Very good post! I especially like the statement “Chemistry is all about helpful rules, but they need to be paired with the wisdom to know when they apply.” Indeed, the human mind strives to find these helpful rules for all aspects of life. It’s a survival mechanism. If one needed to compute an answer from first principles every time, the species would never have survived, unless we were all like Cmdr. Data.

    The Woodward-Hoffmann rules for cycle-addition reactions is another good example. Here, the activation energy of the “disallowed” mechanism is so fanatically high (computed by molecular orbital methods) that the probability of the molecular system to take that route is effectively nil. So, here is a case where a degree of the property is reasonable, but effectively not needed.

    I would wager that nearly all of our chemical rules with discrete outcomes are really examples that are founded on a property with a continuous value where the observed examples lie at the extremes of the spectrum.

    So, where does this leave our impressionable students and our computation chemists? In my own experience, reducing our simple rules to formal algorithms has often led to the discovery that the rules need amendment, so we bring greater understanding to the field. Students, and teachers, need to understand that chemistry, like physics, is grounded on energetics, and in many situations the difference in energy is so large that a simple rule of thumb suffices. Hopefully, they will ask why, and so initiate a new line of research.

    1. Students are doing just fine: if they continue in the field, they learn how it really works and adjust their views accordingly. It’s computational chemists that are the problem. For this issue in particular, practitioners need to know that there is a first-tier of fundamental “handy rules”, i.e. drawing out elements and linking them together with well defined bond types to describe a single resonance form, stereoisomer and tautomer. That part is relatively solid and reliable, because it is known that what has been drawn represents a molecule at a moment in time, in some environment. Then layered on top of this can be any kind of interpretation you want, but if it’s part of the permanent record of the molecular species, you need to record its provenance. But when people want to make yes/no aromaticity part of the fundamental representation and work as if that represented the molecular species in all situations, then the rot starts to creep in.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s