As part of spring cleaning for the new decade of 2020, the website for Molecular Materials Informatics has been re-done. The original plan for the site in 2010 was to make it information rich, but a lot of it had gotten stale over the years: products appear, disappear and change. Now it’s more of a placeholder with links to projects that have their own sites wherever possible, e.g. documentation and open source projects are hosted on GitHub.
Author: Dr. Alex M. Clark
InChI for inorganics
Lately I’ve been working on a new extension to the InChI identifier which is intended to broaden its domain to include the universe of non-organic compounds and all of the insane diversity of exotic bonding types. Preliminary results of the first stage are up on GitHub. Continue reading
FAIR Data Hackathon / BioAssay Express
It’s been awhile since I’ve posted anything, but not for lack of activity in the world of sciencey-informatics. Next week I’ll be at the BioIT World FAIR Data Hackathon in Boston, along with several other members of the Research Informatics team of Collaborative Drug Discovery. Right now we’re tooling up a customised instance of the BioAssay Express (for which the most uptodate standard version can be found here) so that we can deploy several different proposed templates for evaluating whether a published article abides by FAIR principles. The plan is to evaluate as many articles as we can, and produce a scoresheet at the end of the day. I don’t know what the answer will be, and it will be interesting to find out!
Overlapping biology: Cell Line Ontology and BRENDA

One of the pitfalls of using multiple public ontologies is that sometimes there are two teams doing great work that overlaps, but neither is a superset of the other. This has come up for the BioAssay Express project, which uses both the Cell Line Ontology and BRENDA cells & tissues.
Continue readingMixtures: extracting from text
Recently I described an open source tool for editing chemical mixtures, using a machine readable format. Now we have a proof of concept tool for starting with the kinds of text descriptions people use to describe mixtures, and recreating the actual components in their full glory. Continue reading
Molecular Notebook with “dark mode”
The Molecular Notebook desktop app for chemical structure & data content creation has gotten a refresh on the iTunes AppStore: it now responds to the dark-mode preference. Continue reading
Bond Artifacts in SketchEl2, and round-trip MDL Molfile
Awhile back I described the idea of bond artifacts, which are layered on top of a core cheminformatics representation to give the rendering engine the hints it needs to make the visual diagram look like what chemists want to see (without breaking the underlying machine readability). Now this enhancement has been added to the open source WebMolKit framework and the derived SketchEl2 drawing app. Furthermore, the artifacts can survive a round trip encoding with the industry standard Molfile CTAB format. Continue reading
Mixtures & cheminformatics
At CDD we’ve recently begun a new project to define a common format for mixtures of chemicals, along with an open source editor, and impending tools for generating data from text content. The work-in-progress editor is now openly available on GitHub. Continue reading
Adventures with combining PubMed and ChEMBL
One of the things I’ve been investigating lately is the open access segment of PubMed, which is a rather massive collection of open access medicine-relevant publications, with accompanying full text. Similarly with the ChEMBL database, which is focused on structure-activity data traceable back to the original literature document from which each datapoint was curated. This is all for the purpose of advancing the BioAssay Express mission of making the world’s bioassay protocols machine readable (aka FAIR). Continue reading
KNIME integration with BioAssay Express
As of now, there’s a KNIME plugin that can be used to access data from the BioAssay Express. The plugin uses the existing API functionality that can grab all of the available bioassay protocols, or a subset as defined by a query, and bring them into the KNIME ecosystem as a table which can be processed using the multitude of other node types. Continue reading