Honeycomb clustering in Approved Drugs app: sneak preview

honeycomb1Work is currently underway on a novelty feature that will first be exposed within the Approved Drugs app: honeycomb clustering, which is a greedy visualisation technique that is remarkably effective for examining how a particular chemical structure relates to a collection of compounds.

Right now the visualisation interface is only advanced enough to allow display and panning around on the screen, but that’s enough for a couple of preliminary screenshots. The use case that the feature is trying to solve is to provide an answer to a question that ought to be asked all the time in real world cheminformatics: “How does my compound-of-interest relate to a bunch of existing structures?”, and to provide that answer visually, in a way that makes sense to chemists. Such scenarios include postulating a new potential drug-like compound when there is a known collection of molecules that have already been tested for activity, or it might be applicable to a relatively general purpose activity, such as performing a similarity search.

Displaying the inter-molecular similarity between a collection of molecules is a tricky proposition, because most visualisation techniques work best in 2D. Collapsing multidimensional data onto a grid with only 2 degrees of freedom is an inherently lossy process, and requiring it to actually look nice makes the problem harder still. The difficulty of presenting such information is reduced by making the clustering arrangement greedy, by selecting a starting structure and arranging everything around that in a certain priority order.

In the current version of Approved Drugs, and also in TB Mobile, there is a 2D clustering feature that arranges each molecule within a circle and arranges them on a page so they can be viewed conveniently. The behaviour has a ball-and-spring style relaxation mode, so that individual structures can be pushed around, which is fun in its own way. The latest experiment in cluster visualisation uses a tesselated layout that is appealing to anyone who happens to be a chemist, a fan of hexagonal closest packing, or a yellow-and-black coloured social insect:


Note the hexagon in the centre, which is the selected drug, which was picked from the Approved Drugs list. Around this central entry are arranged the 6 compounds most similar to it (as determined by ECFP6 fingerprints), in a circular order that maximises the inter-neighbour similarity. Once these 7 compounds have been placed in the initial flower-petal arrangement, each of the remaining drug structures are placed, in a greedy order, each of them being sequentially slotted into the positions where they fit best with their neighbours.

In the middle of the layout, it is fairly clear that similar drug structures are shown, i.e. the 7-membered/2-nitrogen ring motif, and chloro substituents, are common. As the honeycomb builds outward, clusters of related compounds can emerge into clumps, and some of the surrounding edges can wander off into their own distinctively unusual territory:


There are a few more things to do before this feature will be released as part of the app, but once it’s ready, it will be possible to view any given drug structure in the context of all FDA approved drugs, or to use a custom-drawn drug, and see where it fits into the landscape.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s