MolSync web structure searching

molsync_search1As mentioned in the previous post, the MolSync (.com) website and the technology behind it have been moving forward rapidly. The public-facing deployment now shows a proof of concept page for performing molecule searches: molsync.com/search/molecule.php.

At the time of writing, this feature is very new, and should be taken with a bit of a grain of salt in terms of production ready polish and bugs. It does however demonstrate a number of key features. When you first open the page, or when you click on the round button with a picture of a 6-sided dice (or 1d6, if you want to out yourself as an alpha nerd), the search results populate themselves with some randomly selected molecules, which can be said to be more welcoming than a mostly blank page.

The panel of buttons along the top includes a helpfully blank rectangular grey box. Clicking on this brings up the sketcher, allowing the search molecule to be drawn:

molsync_search2

In the above example, the molecule was sketched out using the tools on the left hand side, which are the familiar set that anyone who has ever drawn a chemical structure is familiar with. The conventional mouse-centric paradigm is used, i.e. select a tool, then drag on the canvas to make something happen. The overall sketcher design is a hybrid of this conventional approach with the command-based approach that was designed for use on mobile devices, which was recently adapted to use on the Mac desktop with XMDS, and so the bottom toolbar (command bank) and right toolbar (template bank) will be familiar to users of the various mobile apps.

In terms of integration with the desktop, the sketcher implements drag’n’drop and pasting, although the web platform has not yet quite made this crucial feature as straightforward as it should be, thanks in part to security issues. Dragging text from an application such as SketchEl or XMDS works intermittently, while dragging a SketchEl file (.el extension) works all the time. Pasting also works, with the Ctrl-V/Command-V keyboard shortcut.

Once the sketch is saved, the search can be initiated. The three classic types of structure searches are supported: exactsubstructure and similarity.

Exact and substructure searches either match or they don’t, so the search results accumulate in the order in which they were found:

molsync_search3

Similarity searches seem a bit more dynamic, because they are ordered by the similarity metric, which means that the list can seem to bounce around a bit:

molsync_search4

One of the things that stands out about the snapshot above, besides the inclusion of the similarity metric (Tanimoto, ECFP6) is that the second result shows two molecules (both of which represent the same compound). The reason for this is that the data that is incorporated into the MolSync collection is document-centric rather than molecule-centric. Two submitted molecules get their own distinct database entry if their sketches are different in any way at all (basically a literal string comparison of the serialised SketchEl representation). When it comes to presenting search results, two molecules are considered to be part of the same hit if they are chemically equivalent (i.e. a subgraph isomorphism). They are considered to be visually identical if the rendering would come out the same – hence the second result shows two sketches of m-bromotoluene, because they are drawn with a different rotation. There may be more than 2 distinct records behind the scenes, e.g. if another sketch has the same orientation but is translated in space, but is sketch-equivalent because translation does not affect the presentation.

This seems like a minor distinction, but it is important, because the objective is not to compile molecules into a registration database, but rather to collect documents together, and provide molecular structure searching (among other things) as a means for finding these documents. The best way to sketch a molecule depends on context, and this is particularly true for chemical reactions, which is the case for much of the source material. The links that are shown to the right of the molecules launch a new page for perusing the individual documents that the molecular search result located.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s