Bond Artifacts in SketchEl2, and round-trip MDL Molfile

wmk_artifacts01Awhile back I described the idea of bond artifacts, which are layered on top of a core cheminformatics representation to give the rendering engine the hints it needs to make the visual diagram look like what chemists want to see (without breaking the underlying machine readability). Now this enhancement has been added to the open source WebMolKit framework and the derived SketchEl2 drawing app. Furthermore, the artifacts can survive a round trip encoding with the industry standard Molfile CTAB format.

The previous article on this subject has a detailed explanation of why this bond artifact idea is necessary to have cake & eat it too. This formally allows us to have machine readable molecular structures that also look nice in just the way scientists expect them to, and formally remove the need to have two separate descriptions (i.e. one that looks right, and another one that is right).

There are 3 kinds of artifacts that can cover a huge fraction of the gap between representations and diagrams: resonance paths, resonance rings, and arenes (the latter is a path-or-ring that is collectively bonded to a specific atom, typically a transition metal). Having this information is a clue to the rendering engine to render the bonding arrangement differently, which is important for aesthetics, but it can also be a a clue regarding symmetry.

These 3 bond artifact types are natively encoded in the SketchEl format in a way that is admittedly awkward, since the format consists exclusively of atoms & bonds, which means that defining properties for groups of atoms requires each of those atoms to opt-in individually. The benefit of this approach is that performing active surgery on a molecule (adding/deleting/editing/reordering) atoms & bonds generally does the right thing gracefully. If you’re curious about how it’s implemented in the native data format, the place to start is GitHub BondArtifact.ts. It’s written in TypeScript, which is like a hybrid of Java & JavaScript.

The round trip through the Molfile CTAB format is done by adding addition M-blocks at the end of the atom/bond section (see MDLWriter.ts and MDLReader.ts). The extra custom blocks are ZPA, ZRI and ZAR (for pathsrings and arenes, respectively).

Examples: carboxylatewmk_artifacts02

Generated by WebMolKit

  4  3  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2990    0.7500    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
    0.0000   -1.5000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  0  0  0  0
M  CHG  1   3  -1
M  ZPA  3   1   2   1   3
M  END

imidazolewmk_artifacts03

Generated by WebMolKit

  7  7  0  0  0  0  0  0  0  0999 V2000
    0.4934    5.1927    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.7201    4.3110    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.2566    2.8844    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2434    2.8844    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.7070    4.3110    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
    3.1335    4.7745    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1467    4.7745    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  5  1  2  0  0  0  0
  5  6  1  0  0  0  0
  2  7  1  0  0  0  0
M  CHG  1   5   1
M  ZRI  5   1   1   2   3   4   5
M  END

ferrocenewmk_artifacts04

Generated by WebMolKit

 11 20  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 Fe  0  0  0  0  0  0  0  0  0  0  0  0
   -0.0154    1.0746    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0352    2.4083    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5164    1.4890    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5028    1.4200    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0642    2.3603    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0051   -1.0747    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.0121   -2.4181    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5020   -1.5035    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.5164   -1.4055    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0868   -2.3500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  3  4  2  0  0  0  0
  4  2  1  0  0  0  0
  2  5  1  0  0  0  0
  5  6  2  0  0  0  0
  6  3  1  0  0  0  0
  1  2  1  0  0  0  0
  4  1  0  0  0  0  0
  5  1  0  0  0  0  0
  6  1  0  0  0  0  0
  3  1  0  0  0  0  0
  8  9  2  0  0  0  0
  9  7  1  0  0  0  0
  7 10  1  0  0  0  0
 10 11  2  0  0  0  0
 11  8  1  0  0  0  0
  1  7  1  0  0  0  0
  9  1  0  0  0  0  0
 10  1  0  0  0  0  0
 11  1  0  0  0  0  0
  8  1  0  0  0  0  0
M  ZBO  8   7   0   8   0   9   0  10   0  17   0  18   0  19   0  20   0
M  ZAR  6   1   1   2   4   3   6   5
M  ZAR  6   2   1   7   9   8  11  10
M  END

It should be noted of course that custom extensions to the Molfile format are usually purged whenever it goes through a write/read cycle involving software that isn’t aware of the extensions, and since this is hot off the press, the number of such programs that I didn’t write is zero. Nonetheless, there was a reason for prioritising this now, and some of the benefits should start to appear in other places (hint: mixtures).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s