The unassailable value of cheminformatics

The overall field of computational chemistry/chemical information/computer assisted drug design can be divided up and categorised in any number of different ways, but one partitioning that seems particularly helpful to me is:

  1. using software to attempt to do real science
  2. using software as a glorified book-keeping toolset

The first category – real science – includes various kinds of efforts to model real systems, identify leads, predict properties, calculate binding affinity, etc. The second category – book-keeping – includes graphical presentation, data warehousing and relatively mundane operations, such as searching, performed on a large scale.

The real science category includes many techniques that make use of 3D structures, such as forcefield energy minimisations, molecular dynamics, quantum chemistry (at various levels of theory, pure or empirical), docking, pharmacophores, and many others. So-called 2D methods, which focus on atom connectivity rather than position, are amenable to techniques such as QSAR/QSPR, and predictions and classifications based on fingerprints, graph-based similarity or group contributions. These are all areas of intense active interest, since the promised payoff is high: computing time is incredibly cheap compared to every other facet of drug discovery or materials research, and a very small incremental improvement provided by software can easily pay for itself. That is, if it works.

Nonetheless, there is certainly a prominent meme going around the industry that the practitioners of many of these real science techniques have been promising results for a long time now, and the perennial excuse of claiming that the breakthrough is being impeded by insufficient computational power is wearing a little thin, thanks to a few decades of Moore’s law. Indeed, a recent issue of the Journal of Computer Aided Molecular Design is dedicated to discussing this. The opinions from experts are widely dispersed over the [opti/pessi]mism spectrum, and there are plenty of opinions about where to lay the blame for the disconnect between promises made and results delivered.

Chemical book-keeping, on the other hand – who really doubts the value of that? Regardless of whether we can program a Beowulf cluster to grind away for months and eventually spit out a promising lead for curing a disease, chemical information still needs to be entered, stored, retrieved, and formatted in a presentable manner for browsing or disseminating.

Which is the theme of this post: I choose to liken half of the field to book-keeping, because the software is essentially performing or helping with a task that individual chemists used to do manually. With a properly designed algorithm and a decent user interface, it becomes possible to do it much faster, more efficiently, and potentially with less opportunities for mistakes and transcription errors. Software can also be used to re-present existing data in ways that make insights easier to grasp.

Any chemist who has been on the planet for more than a quarter century is no doubt familiar with attempting to organise chemical structures with some equivalent of hand-drawn diagrams on PostIt notes, flipping through the last PostDoc’s lab book trying to find one piece of data, manually plotting data points on graph paper in the hope of identifying a trend, or the laborious process of preparing any kind of manuscript or oral presentation containing chemical structure graphics.

The point is that software can make all of these tasks much less painful for small scale use, and in many cases it improve efficiency by so many orders of magnitude that entirely new kinds of research become possible. Compound registration databases, searchable online chemical databases, lab notebooks and inventory management systems are all standard tools for the pharmaceutical industry, and the ability to assembled targetted datasets in minutes, interactively view structure and data relationships, and export presentation quality graphics with a few mouseclicks, is a profound change. Tasks that could have been done by people, but simply never were because the time commitment was unreasonable, are now routine. And unlike certain types of calculations, e.g. docking, there is no decades-long running argument about whether or not the value has been negligible.

The book-keeping arm of cheminformatics may not be as sexy as trying to build a chemical oracle, but the value is real. And there is a lot of room for improvement when it comes to the available software.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s