Creating Microsoft Word documents with vector drawings of structures

The latest piece of technology from Molecular Materials Informatics provides the ability to convert a datasheet into a Microsoft Word document, in which all the structures and reactions are rendered as vector graphics. This capability will be made available to some of the mobile apps, such as MMDS, but there’s no need to wait for AppStore approval, because you can see it at work right now via the web demo software hosted on molsync.com:


The screenshot above shows part of a browser window, after having clicked on the Download button. The list of formats includes an entry called Microsoft Word. Selecting this format, then pressing the Prepare button, creates a downloadable file with the extension .docx. This is the XML-based format that Microsoft calls Office Open XML (or OOXML for short).

If you have the software installed on your computer, the new document can be opened directly with Microsoft Word:

This example uses the Green Solvents demo dataset (which is the same raw data used by the Green Solvents app). The Word document that is generated features a table, which has the same grid layout as the datasheet itself. Each of the structures is rendered using the DrawingML subset of OOXML, and is composed of lines, curves, circles, etc. This means that the graphics can be zoomed in to any resolution without losing any of their sharpness, and also means that they can be equally effectively for screen display (e.g. PowerPoint slides) or for printing, or converting to preprint formats like the Portable Document Format (PDF).

Because the MS Word format is designed for editing, it means that using this format as a medium for chemical data can be an intermediate step. If you are preparing a document for publication, chances are you will want to tweak the table: adjust font sizes, change the borders, reorganise columns, etc., or just cut’n’paste parts of it into a preexisting document. Apart from stating the obvious, this is an important distinction from going directly to a PDF file, since the PDF format is intended to be the last stop prior to printing, and is not designed to be modified.

The other two MS Office formats – Excel and PowerPoint – are coming soon, since they share most of the same underlying technology.

At the time of writing, this feature is only available via the aforementioned web-based tools, which includes publicly shared/tweeted data from MolSync. It will soon be added to select mobile apps via a remote procedure call, similar to the way presentation content such as SVG is currently generated.

The development work for creating the MS Word document generator was remarkably painless, which came as somewhat of a surprise. I have previously worked with conceptually related formats, including EMF+ (Enhanced Metafile+), which is Microsoft’s binary encoding for high quality vector graphics, for which the company reluctantly published documentation in order to get out of an anti-trust punishment. The documentation was so misleading that it was almost worse than useless, and the reverse engineering process was a nightmare. Several years ago, I implemented a similar output system for the OpenDocument formats, which are used by Open/LibreOffice. The format is an international standard, but the vector graphics subsystem is a dysfunctional mess. The inadequacy of the format, and the inadequacy of the single software stack that makes use of it, makes one wonder how these projects even get considered as standards, let alone actually ratified.

Implementing OOXML was almost a pleasure by comparison. The documentation, while a little on the long side (5000 pages), is helpful, thorough, loaded with relevant examples, and perhaps most importantly, the claims it makes all seem to be actually true. The format is reasonably straightforward and sensible, and it is relatively easy to strip out all of the complex auxiliary data that a full blown office package needs in order to keep track of all its formatting data. And not only that, but when you make a mistake, Microsoft Office 2010 actually makes some effort to tell you what’s wrong with the file. Well, actually more like just a line number, but under the circumstances I’m going to call that a win. As opposed to, say, just showing a blank document, or core dumping.

So try it out. And expect more output options to be appearing in mobile apps real soon.

 

About these ads
  1. #1 by reza on 13-Nov-2011 - 6:06 pm

    Great stuff from you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,143 other followers

%d bloggers like this: