As of now, there’s a KNIME plugin that can be used to access data from the BioAssay Express. The plugin uses the existing API functionality that can grab all of the available bioassay protocols, or a subset as defined by a query, and bring them into the KNIME ecosystem as a table which can be processed using the multitude of other node types.
The BioAssay Express is a product from Collaborative Drug Discovery that has the primary purpose of adding semantic web annotations to bioassay protocols. These protocols are normally described using non-machine readable formats, like plain text, opaque diagrams, and occasionally a semi-structured summary field or two. This essentially means that millions upon millions of experiments (which are of incalculable value to the drug discovery industry) are rotting on the data compost heap as PDF files and such.
There are a number of deployments of the BioAssay Express, some of which are public: both www.bioassayexpress.com and beta.bioassayexpress.com contain thousands of carefully curated protocols, and are open to anyone who wants to use the data. It can be browsed and used with the interactive web interface, or it can be acquired in RDF/TTL semantic web format, or it can be grabbed directly using the API. Using an API requires a bit of programming though, so it is important to provide an easier entrypoint.
This is where KNIME comes in: the Konstanz Information Miner is an open source product that allows users to construct workflows by interactively creating a graph of nodes. Each node represents a data processing operation that is applied to the input connections, and passes content forward to its outputs. There are a number of general purpose nodes that are shipped as defaults, and there are a lot more nodes that have been created by 3rd parties, often for very specific purposes. KNIME has achieved a lot of popularity in the pharmaceutical industry, because it provides an open framework and integrates many disparate tools that would otherwise be much less convenient to use.
Collaborative Drug Discovery has been providing a molecule-centric KNIME plugin for Vault for quite awhile. Now there is a protocol-centric plugin for accessing the BioAssay Express data, which is available as a public GitHub repository.
Once the plugin is in the right place, a simple workflow will suffice:
Configuring the BioAssayExpress node has several parameters:
The Site URL should be set to either of the two public repositories, or for those of us with internal or development installations, those will work too.
The Query parameter is a bit more interesting. By default (blank) it will just grab all of the assays in the system, which is a reasonable thing to do in many cases, especially if you intend to post-filter them by your own criteria. The format of the string is not very intuitive, but queries can be composed using the Explore page of the web interface:
This is an example of a filtering of the public dataset for all curated assays that have bioassay type = ADMET, of which there are currently 42. Note the right hand side of the middle of the page: Filter Query is highlighted. This value can be cut’n’pasted into the KNIME configuration field, thus selecting only the ADMET-type assays.
Running the node executes an API query on the given server. The particular entrypoint returns a ZIP file, which is unpacked by the node into a list of assays, each of them having its own JSON format. Because KNIME is based on a tabular format using typed columns, this is a bit different to the internal datastructure, but it is possible to represent it faithfully:
The output starts with identifiers, and follows up with column definitions and content that are made up out of URIs, and if requested, labels too.
We expect that the KNIME plugin will evolve rapidly so that it can be most useful for a variety of tasks, but for now, this version 1.0 is functional and available, and can be used with the public data set.