Tox21 toxicity measurements and PolyPharma

toxpolypharma01The PolyPharma app is currently getting some major new content added to it, namely a bunch of new models for toxicity. The models are derived from measurements made on the EPA’s Tox21 collection, recently published in nature, and released via PubChem.

This nice little development lent itself immediately to the treatment that was recently applied to the ChEMBL dataset, which involved chopping up all of the different target groups, and selecting categories for merging into model-ready source collections. The total number of targets and off-targets was more than 1800, of which a selection of them were picked for inclusion in the PolyPharma app. The EPA Tox21 measurements on the other hand make up a much smaller number of targets – 29 to be exact – and these can be pulled down out of PubChem with a relatively simple script. The measurements have already been classed as either active (i.e. bad) or inactive, which makes life easier – no need to figure out a threshold prior to sticking into a Bayesian model.

Generally speaking, the structures that went into the dataset were of high quality, though a number of corrections are still necessary: washing out salts and adducts, a small number of egregious impossible molecules, and quite a lot of inorganics/organometallics with an inconsistent approach to dealing with non-boring bond types, some of which cannot be represented using normal Molfiles anyway. These had to be fixed manually, which was quite labour intensive.

Long story short, using ECFP6 fingerprints to generate Bayesian models resulted in quite agreeable statistics:

Target ROC Actives/Size
ATAD5 0.8955282143671504 372/9819
DT40 Rev3 0.8358695790539805 2470/8564
DT40 WT 0.8481449024919616 2489/8964
P53-bla 0.8671812374075394 659/9403
RE-bla 0.8199006118936651 1131/7322
HSE-bla 0.7807963830271731 497/8376
romatase 0.8564348950930594 378/7846
mitochondria toxicity 0.8975806303770444 1229/7958
hR-luc 0.897262258206716 1058/8878
R-bla agonist 0.8614046773463251 332/9299
R-bla antagonist 0.8637624381527989 647/8316
R-MDA-luc agonist 0.7773459658086779 414/10133
R-MDA-luc antagonist 0.8322296191717147 468/8269
ER-BG1-luc agonist 0.7402058726456726 1017/8340
ER-BG1-luc antagonist 0.868486076379126 472/8666
ER-bla agonist 0.8224802258757133 560/9545
ER-bla antagonist 0.8269233837880261 432/8297
FXR-bla agonist 0.7711240470264435 118/8674
FXR-bla antagonist 0.8486710970456315 255/7833
GR-bla agonist 0.8291952546091927 211/9256
GR-bla antagonist 0.8502523415129595 452/8094
PPAR-delta-bla agonist 0.8070517362777032 112/8274
PPAR-delta-bla antagonist 0.7698014607724666 92/7937
PPAR-gamma-bla agonist 0.8513509469037785 277/8898
PPAR-gamma-bla antagonist 0.8354619508727978 428/7610
TR-beta-luc agonist 0.7290186160244048 64/9422
TR-beta-luc antagonist 0.826791127196751 419/7391
VDR-bla agonist 0.6806460193156809 22/8437
VDR-bla antagonist 0.811523182164013 83/7699

So far so good. The next step is to actually do something with these models, in the same way as we have been for the ChEMBL extracts. Currently this consists of two pathways: one is creating detailed reports that can be perused and scrolled through when applied to individual molecules, such as discontinued drugs, and the other is to add the functionality to PolyPharma, so that anyone with an iThing can try it out.

All of the targets have been added to the mix, and there is now a default Profile called “Toxicity” which focuses entirely on these newly added models:





The updated version of the app hasn’t been submitted to the AppStore just yet, but it will be shortly, after a bit more testing. Stay tuned for the next version!

Leave a comment

Adventures with KNIME: pipeline wrapping

knimemmi1One of the items that has been on my to-do list for quite some time is to dig back into the KNIME universe, and interface it with the growing collection of cheminformatics algorithms that I have been assembling over the last few years. Given that my company’s software stack has its own workflow/pipelining infrastructure, but no user interface, it makes a certain amount of sense to look into connecting them together. Read the rest of this entry »

Leave a comment

Panel of Bayesian screening for BIA 10-2474

The around town in drug discovery right this moment seems to be focused on BIA 10-2474, which my frequent collaborator Sean Ekins has weighed in on at the Collaborative Chemistry blog. In a spur of the moment effort to see if we could use some of our work-in-progress technologies to learn something about what’s going on, we ran it through a series of 1800 Bayesian models that we extracted from ChEMBL. For a detailed view, check out this link on The file is close to 20MB, so be patient if you’re on a slow connection. Read the rest of this entry »

Leave a comment

Composite Bayesian models: latest open source project

TOCMy latest publication has just come out as early access in Journal of Chemical Information & Modeling, entitled “Open Source Bayesian Models: 3. Composite Models for Prediction of Binned Responses“. This is an extension of previous work on the Bayesian/fingerprint theme, and in the interim while waiting for peer review, we have some additional developments to share. Read the rest of this entry »

Leave a comment

Experiment editing in XMDS: dogfooding time

editexperiment1Progress is coming along nicely with the Experiment aspect editor for XMDS, which is essentially the desktop platform playing catchup with mobile and matching the functionality of the Green Lab Notebook (GLN) app. There is now an editor in place for the overall reaction scheme, which allows the components of a multistep reaction to be composed. This is a development milestone because it means that it’s time to switch to using it for data entry to find out what needs to be added or fixed most urgently, rather than implementing essential features that are obviously missing (cf. dogfooding = eating one’s own dogfood, in case anyone is unfamiliar with the term). Read the rest of this entry »

Leave a comment

Reactions in XMDS

xmds_reactionsAfter much procrastination, chemical reactions have started to make their way into the OS X Molecular DataSheet (XMDS) beta. The screenshot shows several multistep reactions from my Ph.D. research, and were originally entered using the Green Lab Notebook (GLN) app.

The Mac app and the iOS app both use the same data format, which is based on the extensible aspect protocol for molecular datasheets (see description), which seeks to capture a reaction description that is highly machine readable. As well as representing a multi-step schema, which can handle the non-organic species that are so commonly used as reactants, it also handles stoichiometry and quantities, in a way that can be used to balance the reaction and generate metrics. This has been available on mobile for awhile now, but the desktop analog is catching up.

Editing has yet to be implemented, but there’s a lot of background work that goes into just displaying reaction schemes, so it’s well underway.

Leave a comment

Breaking radio silence

This site hasn’t had a post for couple months, and there’s a reason for that: I’ve been out of town, doing some collaborative work on-site, but now things are getting back to normal. Read the rest of this entry »

Leave a comment


Get every new post delivered to your Inbox.

Join 1,254 other followers