MMDS now supports Chemical Markup Language (CML)

Support for the Chemical Markup Language (CML) has been added to the Mobile Molecular DataSheet (MMDS). CML is an XML dialect for describing chemical entities in general, and structures in particular. This functionality is provided to the mobile app via remote procedure calls to, and will most likely first be encountered when sending data via email:

If the Chemical Markup Language option is checked, then the outgoing message will contain a CML file representing the current subject, whether it be a molecule, reaction or datasheet.

The format parsing goes both ways, so if you open a CML email attachment, or try to download a CML file using the Safari mobile browser, it will offer the option of opening it with MMDS.

Encoding chemical data using CML is a more interesting experience than one might expect from an open standard about which there is much documentation. While the basic ideas involved with encoding an ordinary molecule with CML are simple enough, the format is not very rigid about some core concepts (e.g. valid bond types, stereochemistry, hydrogen counting, sub-fragments, among others). It is also seemlessly easy to extend, which is a key property of XML, and so extended it typically is. This means that parsing a CML file from an unknown origin is a bit of a mystery grab bag – and the same applies when creating a CML file with the intent of having it be parseable by as many other software packages as possible.

The parser used by MMDS tries to be as liberal as possible when it comes to parsing incoming content, and is capable of extracting a single molecule, collections of multiple molecules, and even sets of reactions. For creating CML output, some of the extensions defined by ChemAxon are used, including the encoding style for reactions and S-groups (roughly equivalent to inline abbreviations). Some custom attributes are thrown in for good measure, to ensure that molecular structures and reactions can survive a round trip, i.e. MMDS to CML to MMDS returns the same original content.

The CML reader/writer is a new piece of code, and will undoubtedly go through a series of evolutionary steps in order to interoperate with as much other software as possible. Any comments about the quality of CML generated, or its ability to parse incoming CML, are welcome.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s