New Molecular Materials Informatics website

MMIAs part of spring cleaning for the new decade of 2020, the website for Molecular Materials Informatics has been re-done. The original plan for the site in 2010 was to make it information rich, but a lot of it had gotten stale over the years: products appear, disappear and change. Now it’s more of a placeholder with links to projects that have their own sites wherever possible, e.g. documentation and open source projects are hosted on GitHub.

FAIR Data Hackathon / BioAssay Express

It’s been awhile since I’ve posted anything, but not for lack of activity in the world of sciencey-informatics. Next week I’ll be at the BioIT World FAIR Data Hackathon in Boston, along with several other members of the Research Informatics team of Collaborative Drug Discovery. Right now we’re tooling up a customised instance of the BioAssay Express (for which the most uptodate standard version can be found here) so that we can deploy several different proposed templates for evaluating whether a published article abides by FAIR principles. The plan is to evaluate as many articles as we can, and produce a scoresheet at the end of the day. I don’t know what the answer will be, and it will be interesting to find out!

Bond Artifacts in SketchEl2, and round-trip MDL Molfile

wmk_artifacts01Awhile back I described the idea of bond artifacts, which are layered on top of a core cheminformatics representation to give the rendering engine the hints it needs to make the visual diagram look like what chemists want to see (without breaking the underlying machine readability). Now this enhancement has been added to the open source WebMolKit framework and the derived SketchEl2 drawing app. Furthermore, the artifacts can survive a round trip encoding with the industry standard Molfile CTAB format. Continue reading

Adventures with combining PubMed and ChEMBL

One of the things I’ve been investigating lately is the open access segment of PubMed, which is a rather massive collection of open access medicine-relevant publications, with accompanying full text.  Similarly with the ChEMBL database, which is focused on structure-activity data traceable back to the original literature document from which each datapoint was curated. This is all for the purpose of advancing the BioAssay Express mission of making the world’s bioassay protocols machine readable (aka FAIR). Continue reading