Molecules in WordPress: preliminary experiment

molpress1 Something I started thinking about recently is the idea of mixing high end cheminformatics tools with generic blogging software, to achieve a reasonable compromise in the electronic lab notebook area. It turns out that writing a WordPress plugin really is as easy as the documentation claims, and hooking it up to my work-in-progress web toolkit is also rather straightforward.

It is hardly a revelation that off-the-shelf publishing software like WordPress and many other open source blogging platforms makes quite a good platform for documenting laboratory experiments, at least when comparing to using a wordprocessor: the quality of manuscript is about the same, and the format lends itself naturally to being published on the open web, or within a private intranet.

One of the problems, though, is that these general purpose authoring tools have no support for scientific diagrams with metadata, which means that if you want to add chemical structures, reactions, or any kind of arrangement thereof, you will likely need to use a dead format (usually PNG) which is completely intractible to software. So after all that effort of making your work digital and open and available to the entire planet at a split second’s notice… it will likely vanish into obscurity, because machine algorithms can’t tell what’s in it, which means that it can only be discovered and used by an actual person. And if you’ve ever tried to read the entire internet, or even get up to speed on the most niche specialty scientific topic, you’ll know that this is a lost cause.

In an ideal environment, every scientific experiment would be documented exclusively in a purely machine readable format and every nuance would be captured, and it would be translated into a human language on demand. But we’re not there yet, so the best compromise is to shoehorn as much data as possible into formats that can capture the information in detail, and explain the rest in prose.

Molecules are as good a place as any to start, since they are a fairly fundamental diagram unit in most branches of chemistry. A large proportion of molecular entities can be described very well using a diagram that uses just a handful of primitives representing various different atoms and bonds. More complicated molecules can put a strain on the representation (e.g. isotopes, abbreviations, weird bond types, polymers, etc.) but there is a huge amount of science that can be expressed well with a handful of well constructed concepts.

The way to solve the machines vs. humans problem (machines like data, humans like pictures) is to encode the chemical structures within the document, in data form. That way when a robot downloads your lablog notebook page, it finds a bunch of content that it can work with, e.g.:

This is a snapshot of editing an example blog post using a freshly downloaded copy of WordPress. Note the scattering of fragments such as this one:

[molecule name=”benzene”]SketchEl!(6,6)
C=-2.3039,5.5512;0,0,i1
C=-3.6029,4.8012;0,0,i1
C=-3.6029,3.3012;0,0,i1
C=-2.3039,2.5512;0,0,i1
C=-1.0049,3.3012;0,0,i1
C=-1.0049,4.8012;0,0,i1
1-2=2,0
2-3=1,0
3-4=2,0
4-5=1,0
5-6=2,0
6-1=1,0
!End[/molecule]

The special tag [molecule] is recognised by a plugin that I just started working on (working name: MolPress). The content between the tag is a SketchEl-formatted representation of benzene (but it could just as easily be an MDL Molfile).

So far this is only good for machines – but when it gets shown to the viewer, it looks like this:

molpress3

This example page briefly summarises up the early weeks of my grad school research, which I remember fondly. Each of the 4 molecular structures are shown as pictures: the plugin renders them as SVG diagrams, which means that if you zoom in or print or turn the page into a PDF, it will look as crisp as the device can deliver. But to a robot, the first thing it sees is the machine readable format.

The work that goes into rendering the structures is done by the WebMolKit library, which is derived from the MolSync codebase, but is being reengineered to no longer require a server. It is actually open source, but not released yet: too many pieces are in the air, but it should hopefully settle down reasonably soon. The MolPress project will be made openly available, too: it’s a good fit for the whole open data philosophy.

The very simple functionality that is operational right now requires the cutting & pasting of plain text as part of editing, and it doesn’t have the WYSIWYG capabilities that we would like to see in an editor. Those are the next logical steps: having the inline editor show the graphical version (except when in raw text mode); making it easier to drag structures into the editor from various sketchers; connecting the library’s own sketcher to the plugin; and adding higher order datatypes, such as reactions and collections.

Cheminformatics 2.0

A blog about chemical information software for next generation computing environments.

Molecules in WordPress: preliminary experiment

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply