Things have been a bit quiet in these parts lately, but not due to inactivity: far from it. In between working on some exciting projects with Collaborative Drug Discovery, I have been quietly making rapid progress on several important key technologies. These include the OS X Molecular DataSheet (XMDS), presiding over a growing collection of reaction data, and most recently a complete overhaul of the MolSync website, which provides cheminformatics support services of various kinds.
As mentioned in earlier posts, the XMDS desktop Mac app has been a high priority pet project for data entry of chemical reactions. Designing an editor for chemical reactions seems like it should be quite easy – just a handful of extra features layered on top of the core functionality of a structure editor – but the reality is nothing of the sort, especially if that editor aspires to capturing the data in its full informatic glory, and making it maximally ergonomic for data entry. One of the handy things about writing software that one is personally qualified to use and have a real need for is that a lot of refinement can be done without having to persuade someone else to use it. At this point I have drawn out literally hundreds of chemical reactions, in detail. They are taken from various sources, all of which are not machine readable. For example, some of the reactions from ChemSpider Synthetic Pages:
… and an editing session:
Curating this data is an interesting exercise in and of itself, since the source content – despite being online – is not really machine readable. The reactions that I have carefully recreated are interpretable as properly typed molecular species in balanced reactions with atom-to-atom mapping and material quantities.
This exercise has an obvious double purpose: improving the data entry software, and also creating a body of reaction data, which is very highly specified – accurate and precise – relative to what people generally do with these things. One of the nice characteristics about chemical reactions is that although there are millions of published experiments, the majority of organic chemistry can be captured with a number of reaction types that is numbered in the tens or hundreds, depending on thoroughness. These reaction types can be illustrated quite well with maybe an order of magnitude more specific examples, which means that the amount of data entry needed to create a database that can provide useful reference information and guidance is quite well within the range of what one person can create singlehandedly (with some appropriately excellent software tools, of course).
In terms of derived functionality, there’s nothing especially functional just yet, but toward that goal, the MolSync service has been subjected to a major overhaul. On the surface it still performs the same functions as it has for years, e.g. sharing molecules:
Right now this interface does nothing but display content, but that will change. You may notice that the top of the page includes a couple of blank rectangles which look like placeholders for a chemical reaction, which indeed they are. These are to be linked to a work-in-progress, which is a porting effort to bring the MMDS/XMDS sketcher to the web platform. The web-based sketcher was actually started a long time ago, but fell victim to prioritisation. Now with an invigorated core platform, web sketching is back on the menu:
The sketcher concept is analogous to the desktop implementation found in XMDS, which combines the “low input bandwidth” sketching technology designed for touchscreen mobile devices (toolbars on bottom and right) with a more conventional array of tools on the left, which are familiar to every chemist. At the moment it is just a few features away from being able to do some basic drawing, although the finishing touches will take a bit longer.
Having a web-based sketcher is a priority and a rate limiting step, because the goal is making collected reactions useful, starting with ways to search them.