New paper: Accurate Specification of Molecular Structures: The Case for Zero-Order Bonds and Explicit Hydrogen Counting

For anyone interested in representation of small molecules, I would emplore you take a look at my latest paper in the Journal of Chemical Information and Modeling:

This article addresses a long-standing shortcoming in the cheminformatics cottage industry: the inadequacy of common file formats to reliably represent structures of non-organic compounds.

Not wanting to spoil the punchline, the synopsis is that current formats like MDL Molfile (which has been obsolete for a long time, see Why Not to Use MDL MOL/SDF) were designed to represent drug-like molecules, and just coincidently happen to work adequately for a few others, but because of their limitations, keep the study of cheminformatics confined to a subset of organic molecules. The problem can be solved quite easily, simply by allowing an additional bond type – zero order – and adding an additional atom property to control the automatic addition of hydrogen atoms. These two simple enhancements opens the door to representing pretty much any molecular species that makes sense to compose out of a graph of atoms and bonds.

It should be noted that there are a few file formats that support these properties, though for the most part are not advocated for this purpose. The SketchEl open source project, which I started some years ago, actively supports zero-order bonds and hydrogen atom counting, and all of the mobile apps from Molecular Materials Informatics, starting with MMDS, use the SketchEl molecule format as their native datatype.

The paper describes some simple additions to the MDL Molfile format so that it can support these extra fields. These are essentially trivial to implement, except that there are some subtleties when it comes to backward compatibility: nonorganic compounds are often represented by circumventing the absence of a zero-order bond by pushing charges around to fix up the broken valences. Sometimes this sort-of works, sometimes it doesn’t, but it does mean that storing an extended MDL Molfile such that it is maximally compatible with legacy and modern software makes it preferable to use both of these styles. The paper describes an algorithm to convert zero-bonds into charge separated notation, in cases when it is plausible, and store these in parallel.

This feature is currently implemented and available from MMDS, and comes up when you initiating an outgoing email:

Selecting the extended MDL MOL option makes it calculate and store the charge-separated form, as well as encoding the overridden fields, within the V2000 Molfile, which is included as an email attachment.

The subject of inadequate chemical file formats is also closely related to the recently reinvigorated subject of junk chemical data, about which I put in my 2 cents worth earlier this year. A huge portion of chemical data that is available from various sources is flawed for one reason or another, but a significant portion of these problems arise from the simple fact that the molecule species being represented are incapable of being described using the legacy formats that have been adopted as industry standards.

 

Advertisements

2 thoughts on “New paper: Accurate Specification of Molecular Structures: The Case for Zero-Order Bonds and Explicit Hydrogen Counting

  1. Interesting paper. I don’t think zero-order bonds are quite sufficient to express three-centre two-electron bonds due to the fact the two 1/2 order bonds end up being not identical. Nonetheless in the other examples they are a lot better than the alternative representations!

    1. While zero-order bonds are not sufficient to fully describe the properties of these bonds, what they can do is allow the book-keeping to add up (charge, valence, oxidation state), and still express in a meaningful way the portions of the molecule that do follow the normal octet rules. It is also incredibly simple to implement, and seriously, nobody has any excuse not to build this into their software.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s