Things have been a bit quiet around here lately, but there is a good reason: at the Gordon conference on Computer Aided Drug Design a couple of weeks ago, a confluence of ideas came together. With the opportunity to observe a series of talks and posters about what practitioners in the industry are interested in right now, and the chance to discuss my recent projects and gauge the level of interest, it all happened to coincide with a number of partially complete product explorations that have been sitting on the workshop floor, waiting for the right time and place to assemble release to the outside world.
One of the brutal realities that I have been struggling with for a long time with regard to finding a way to promote mobile apps for chemistry into a genuinely valuable workflow component is a paradox: appification of science is almost an oxymoron, because most kinds of science are extremely niche, very hard, and highly unique. This is the opposite best case scenario for there’s an app for that, which works best when there are millions of people with common behaviour for which 80% is good enough for everyone. That in a nutshell is why mobile apps for science are lagging so far behind even other vertically integrated niche markets.
For the first few years of the existence of Molecular Materials Informatics, the technical challenges that I focused on involved making specific activities work well on a palm-sized touchscreen device, and make the onboarding process a simple as possible. Starting with the drawing of chemical structures, to advanced database searching, sophisticated calculations performed on the device itself, to database cluster visualisation… these are all well and great, but there’s one important thing missing: your data. Taking the strategy of appifying specific unit tasks in a chemist’s workflow is generally difficult but not impossible; however, any task that requires some specific collection of structure-activity data to be available to such an appified task hits a roadblock. For all my claims that mobile devices are good for a very large proportion of scientific tasks normally done on desktop computers, one thing that they are currently not good for is assembling custom collections of structures & data. The kinds of content one would put together is typically done with a single file on a regular computer: grabbing some content from online databases, appending some from other files, sketching/typing in SAR content from recent papers, and then merging it all together, with fields renamed and condensed, duplicates unduplicated, weird structures standardised and fixed, junk data removed, and the whole thing given a once-over with a fullscreen viewer. This particular task is notoriously difficult to appify, because each instance is a unique process: the number of tools and user freedom that need to be turned onto the task to get it done is very diverse, and unforgiving of the very constrained operational parameters that are typical of mobile apps (and webapps, for that matter). Perhaps sometime in the future it will be easy to do this with web interfaces and mobile apps on a tablet, but today it is difficult, even though apps like the Mobile Molecular DataSheet (MMDS) do offer much of the necessary functionality, the reality is that it is not a good way to get the job done for moderately demanding cases.
This is one of the primary motivators for revisiting the desktop with the OS X Molecular DataSheet (XMDS) Mac app, which is currently in beta: Apple’s opening up of better file sharing APIs (motivated by iCloud, but really great for services like Dropbox) means that desktop/laptop + tablet/phone are much better at complementing each other. Certain hard-to-appify tasks, like structure-activity datasheet preparation, are just a lot better on a desktop computer, but visualisation, reference, searching, and small-bandwidth content creation are very competitive options with mobile apps. Mixing and matching the best possible software means that an ideal future is soon to be at hand, whereby you can do your work with whatever device is most conveniently within reach (iPhone while waiting in line, iPad sitting on a bus, MacBook in a café, iMac at your office desk), and your data follows you around wherever you go, thanks to the internet.
That’s one strategic vision. But another that I’ve been simultaneously working on takes a subversively different approach: what if your data wasn’t actually something you had to create yourself out of materials that you scrounged from wherever you could get them and fixed them up as best you could, but instead they were just prepared for you, and happen to be sitting on a shelf, all ready to use?
This strategy has its roots in project that is described in an article this year (JCIM 2015) that is primarily about validating open source Bayesian models. In order to test the models on a large scale, I decided it was time to sit down and see what the big deal is about this ChEMBL thing that everyone keeps talking about (and still is, given the number of times it was mentioned at the Gordon last month). By slicing up the ChEMBL data collection into a couple of thousand groups of molecules with the same target and activity measurement, it became possible to use each of these sub-groups as a separate validation test, each of which consists of real molecules and real activity data. With that kind of diversity, a lot of realistic edge cases can be exposed, and hence confidence in a method is much higher than it would be if only a handful of arbitrarily selected case studies were fed into it. Using this data for validation turned out to be a good call, and the more I worked with the data, the more convinced I became that open data for bioactivity has passed some kind of rite-of-passage: no longer are we forced to choose between quality and quantity (or all to frequently, neither); ChEMBL is quite large, and has a remarkably low proportion of junk, and covers literally thousands of biological targets, including most of the classics.
What this means is that there is a baseline reference collection, free for anyone to use, that requires a little bit of information technology wizardly to reformulate, but is otherwise ready to go as a source for predictive models against most kinds of well known drug targets. Relating to the previous point: this means that it may not be necessary for you to have your own data collection at all, if the targets you are interested in have already been curated, assembled, filtered, cleaned, modelled and deployed, all on the convenience of your iThing; and of course regularly maintained and re-trained every time more open data becomes available. It may turn out to be better than what you would have assembled manually, and there’s also a high likelihood that there are some other targets that are interesting as serendipitous discoveries, or off-targets to avoid, that you may not have considered worth the effort of modelling. But because they’re right there, you might as well take a look.
Which brings the narrative back to the main subject of this article: the PolyPharma app. The fundamental objective of this product is to bring structure-activity prediction and visualisation to discovery chemists right out of the box: no learning, no importing, nothing more complicated than tapping on a couple of objects to get started.
This is what the app currently looks like when it is launched for the first time:
There are three bars: Profiles, Predictions and Molecules. The profiles are helpfully named after diseases, and the molecules are each represented by organic structures, with varying degrees of druglikeness. The middle bar – predictions – is inviting us to select one of each.
Tapping on the Tuberculosis profile and the first molecule gets the action happening:
As you can see, the predictions section is being filled out. In this case the Tuberculosis profile has within it indicated that the main target of interest is Mycobacterium tuberculosis, as one might expect. One secondary “target” is also included as water solubility, and 3 off-targets are also listed. Each of these corresponds to a Bayesian model, built from either custom data or data extracted from ChEMBL, and the predictions for the structure are colour coded. At the top there is a somewhat familiar “flower petal” diagram; the petals use the red-yellow-green (traffic lights) colour scheme for targets that are desirable to hit, and the blue-white-red (doppler) scheme for off-targets that should be avoided. Underneath the flower petals are a series of repeat renditions of the chemical structure, each of which is coloured by atom, to give some idea of the structural contribution to the goodness/badness of the Bayesian model, according to the ECFP6 fingerprints that were originally fed into it.
The calculations keep on running in the background: the app continues on to generate honeycomb clustering representations, which show the selected molecule in context:
Although it’s hard to tell in the thumbnails shown above, these clustering schemes feature the current molecule as a blackened hexagon, typically near the middle; around them are arranged all of the other molecules in the custom list (grey) and a diverse selection of the molecules that went into the original model, i.e. tuberculosis and water solubility, as shown above.
Tapping on the little arrow brings up the fullscreen versions:
These are much easier to pan & zoom than the thumbnails. Right now they show just their own single preference for colour coding, but before this goes live to the AppStore, it will be possible to redecorate the clustered hexagons to show a variety of properties, such as various predictions from various models, to visualise putative activity relative to 2D proximity based on structural similarity metrics.
There are more calculations on the way – this is just a sneak preview – but already it seems like there is an awful lot being presented here. And keep in mind that all of this was triggered using just two taps on the screen, after launching the app… pick a profile, pick a molecule; swipe up and down to look at visualisations as they are calculated.
The actual profiles themselves, which consist of the “targets of interest”, can be created and edited within the app. The actual list of targets (each with a model & a selection of structures with corresponding activity) is hard-coded, and currently consists of a small subset of content from ChEMBL, and disease models created with data from Sean Ekins. It is just the tip of the iceberg, but it is plenty enough for a demo.
The molecules that come with the app by default are cherry-picked to be interesting to look at in context of the models, and they can be supplemented in various ways. Right at this very moment the only way to add a new molecule is to import it by typing in a ChemSpider ID, which makes the app head off to the ChemSpider site to grab the structure. Note that this is the only time the app uses the Internet, which is another really important detail, because you don’t have to worry about security any more than you would already: any sensitive data is stored locally on the device, and all calculations are done locally. Information is only sent or received when explicitly requested as part of an import/export operation. Password-lock your device, and remote wipe it if it gets lost: if it’s good enough for your bank account, it’s good enough for your molecules.
Once the app is finished, all the usual ways of getting data in & out of the app will be available whenever you need them, i.e. sketching, structure searching, clipboard pasting, email, inter-app communication and cloud file hosting services.
The release date for the AppStore is yet to be determined, but I will be bringing it with me to the American Chemical Society meeting in Boston in a little more than a week, so if you want to see it in action, you know where to find me: I’ll be giving presentations on Sunday & Monday, and will be happy to show it off in person. Otherwise, stay tuned: before long, you’ll be able to have it on your iPhone or iPad.