The around town in drug discovery right this moment seems to be focused on BIA 10-2474, which my frequent collaborator Sean Ekins has weighed in on at the Collaborative Chemistry blog. In a spur of the moment effort to see if we could use some of our work-in-progress technologies to learn something about what’s going on, we ran it through a series of 1800 Bayesian models that we extracted from ChEMBL. For a detailed view, check out this link on molmatinf.com. The file is close to 20MB, so be patient if you’re on a slow connection.
The background of this work involves a data mining exercise, which starts be reorganising the hierarchical fields within ChEMBL to make the data suitable for feeding into a model (see corresponding literature reference). As a followup from the automated extraction and model building, we put together a script for taking a list of molecules (using a selection of “discontinued drugs” to start with) and for each molecule generate a lengthy report, which involves running it through all 1829 Bayesian models. These correspond to a diverse variety of targets, some desirable, others not, with some redundancy with regard to different organisms (human/mouse/rat/etc.) and different kinds of measurements. The report starts with a ranked list of how well the molecule was predicted to be with each Bayesian model, and then for the more promising cases, a more detailed view which shows the model’s ROC plot, the atom-coloured Bayesian prediction, and a Honeycomb cluster of similar compounds from the dataset. The latter is intended to provide a reality check: nobody should ever place high trust in a number that came out of a model without first digging into the details.
These reports (including the one we just made available) are a bit unwieldy, but there’s quite a bit of information in there, some of which may be useful. In fact, the static web page with embedded graphics is the direct precursor to a more friendly implementation: the PolyPharma app, which you can check out on iTunes. It’s free, and nicely interactive, though we didn’t quite manage to squeeze all of the ChEMBL models in there.