Last week I had the pleasure of representing Collaborative Drug Discovery at the BioIT World Hackathon, themed around FAIR data principles.
The FAIR acronym stands for Findable, Accessible, Interoperable and Reusable, which is a bunch of concepts that are obviously good things to have. My own personal interest in this is as an extension to the open access movement: getting scientific research out from behind paywalls is finally starting to gather momentum, but it’s only the first step. Being able to have the PDF file for any research paper is a good start, but when you need to analyze thousands of them, after having narrowed them down from a complete set of millions, it becomes clear that having access is really just table stakes. If we’re going to put science to work, we need to appreciate that what really matters is whether or not machines can use it.
And that’s what this hackathon was about: the second part of FAIR, where we require that there’s enough metadata and well documented API entrypoints that software can grab & process it effectively. This machines-first emphasis was made very clear during the introductory remarks, which was music to my ears (especially since I published a paper several years back with a title along those lines).
In this scenario, human expertise kicks in after a whole lot of preprocessing steps where the machines filter down the relevant content and get it ready for presentation… as opposed to the way things usually work now, which involves a person reading papers one at a time as the starting point.
The topic that we at Collaborative Drug Discovery nominated is about annotation of bioassay protocols. See our event GitHub page for more details. This is a major interest of ours, because of the BioAssay Express project which is, at its core, an attempt to make this particular data category FAIR by any means necessary.
One of the things we discovered at the beginning of the event is that a “FAIR data evaluator” created by Purple Polar Bear gave us a score of 48%. That seems a bit low for a project that’s designed to be have FAIRness baked into the cake from day zero, but fortunately most of the failure categories were caused by either low hanging fruit that was easy to fix (mainly API documentation) and with difficulties understanding the questions. With a few improvements, our score went up to 75% or 91%, depending on how strictly the requirements are interpreted. Much discussion ensued about all of these things, and one of the major deliverables from this event is that a room of a hundred or so people have a much better idea of what needs to be done, and how to go about it.
As you can see from the picture below, everybody was intently focusing the whole time: the concentration level is entirely representative of the event!