The PolyPharma app is currently getting some major new content added to it, namely a bunch of new models for toxicity. The models are derived from measurements made on the EPA’s Tox21 collection, recently published in nature, and released via PubChem.
This nice little development lent itself immediately to the treatment that was recently applied to the ChEMBL dataset, which involved chopping up all of the different target groups, and selecting categories for merging into model-ready source collections. The total number of targets and off-targets was more than 1800, of which a selection of them were picked for inclusion in the PolyPharma app. The EPA Tox21 measurements on the other hand make up a much smaller number of targets – 29 to be exact – and these can be pulled down out of PubChem with a relatively simple script. The measurements have already been classed as either active (i.e. bad) or inactive, which makes life easier – no need to figure out a threshold prior to sticking into a Bayesian model.
Generally speaking, the structures that went into the dataset were of high quality, though a number of corrections are still necessary: washing out salts and adducts, a small number of egregious impossible molecules, and quite a lot of inorganics/organometallics with an inconsistent approach to dealing with non-boring bond types, some of which cannot be represented using normal Molfiles anyway. These had to be fixed manually, which was quite labour intensive.
Long story short, using ECFP6 fingerprints to generate Bayesian models resulted in quite agreeable statistics:
So far so good. The next step is to actually do something with these models, in the same way as we have been for the ChEMBL extracts. Currently this consists of two pathways: one is creating detailed reports that can be perused and scrolled through when applied to individual molecules, such as discontinued drugs, and the other is to add the functionality to PolyPharma, so that anyone with an iThing can try it out.
All of the targets have been added to the mix, and there is now a default Profile called “Toxicity” which focuses entirely on these newly added models:
The updated version of the app hasn’t been submitted to the AppStore just yet, but it will be shortly, after a bit more testing. Stay tuned for the next version!
One of the items that has been on my to-do list for quite some time is to dig back into the KNIME universe, and interface it with the growing collection of cheminformatics algorithms that I have been assembling over the last few years. Given that my company’s software stack has its own workflow/pipelining infrastructure, but no user interface, it makes a certain amount of sense to look into connecting them together. Read the rest of this entry »
The around town in drug discovery right this moment seems to be focused on BIA 10-2474, which my frequent collaborator Sean Ekins has weighed in on at the Collaborative Chemistry blog. In a spur of the moment effort to see if we could use some of our work-in-progress technologies to learn something about what’s going on, we ran it through a series of 1800 Bayesian models that we extracted from ChEMBL. For a detailed view, check out this link on molmatinf.com. The file is close to 20MB, so be patient if you’re on a slow connection. Read the rest of this entry »
My latest publication has just come out as early access in Journal of Chemical Information & Modeling, entitled “Open Source Bayesian Models: 3. Composite Models for Prediction of Binned Responses“. This is an extension of previous work on the Bayesian/fingerprint theme, and in the interim while waiting for peer review, we have some additional developments to share. Read the rest of this entry »
Progress is coming along nicely with the Experiment aspect editor for XMDS, which is essentially the desktop platform playing catchup with mobile and matching the functionality of the Green Lab Notebook (GLN) app. There is now an editor in place for the overall reaction scheme, which allows the components of a multistep reaction to be composed. This is a development milestone because it means that it’s time to switch to using it for data entry to find out what needs to be added or fixed most urgently, rather than implementing essential features that are obviously missing (cf. dogfooding = eating one’s own dogfood, in case anyone is unfamiliar with the term). Read the rest of this entry »
After much procrastination, chemical reactions have started to make their way into the OS X Molecular DataSheet (XMDS) beta. The screenshot shows several multistep reactions from my Ph.D. research, and were originally entered using the Green Lab Notebook (GLN) app.
The Mac app and the iOS app both use the same data format, which is based on the extensible aspect protocol for molecular datasheets (see description), which seeks to capture a reaction description that is highly machine readable. As well as representing a multi-step schema, which can handle the non-organic species that are so commonly used as reactants, it also handles stoichiometry and quantities, in a way that can be used to balance the reaction and generate metrics. This has been available on mobile for awhile now, but the desktop analog is catching up.
Editing has yet to be implemented, but there’s a lot of background work that goes into just displaying reaction schemes, so it’s well underway.
This site hasn’t had a post for couple months, and there’s a reason for that: I’ve been out of town, doing some collaborative work on-site, but now things are getting back to normal. Read the rest of this entry »