WebMolKit 2.0 on GitHub and NPM

A new branch of the WebMolKit open source library for cheminformatics on JavaScript platforms is now available. The most noticeable change is a major source code refactor to use the modern import framework (ES6 modules), publishing with NPM, and also some new functionality such as resolving bond line crossings using a pseudo-embedding algorithm.

The refactoring part is based on technical debt that has been present since the beginning: when I ported the nascent library to TypeScript many years ago, the most promising option for bundling all of the functionality into something that could be run on a web browser was to use the namespaces feature that was invented for the TypeScript compiler. In a nutshell this basically solves all of the bundling problems in a simple and effective way that is compatible with many different development workflows and runtime targets. But it has plenty of shortcomings, not least of which being that it has been deprecated for a long time. The preferred method for library management is to use ES6 modules, which involves using the export/import keywords to reference resources explicitly as needed. It is a lot more similar to how platforms like Java manage their library archives and follows similar software engineering principles. The problem is that it is very fussy and very fragile, and it’s also completely incompatible with the older namespace method: you either convert all of your projects to the newer way in one go and never look back, or you hit a wall somewhere along the line and have to roll it back.

WebMolKit v2.0 is the third-time-lucky attempt to do this. As it happens I’ve been learning about stitching together diverse and complex JavaScript frameworks from the folks at CDD, and one of the lessons is that there is not really any way to get around adding yet another step in the build process, namely webpack. This should bother anyone who cares about details, because TypeScript is already doing a comprehensive transpilation process, and has the ability to bundle the modified code. But the reality is that deployment on the various different JavaScript engines is complex: web browser, desktop Electron, console NodeJS and web workers make 4 different platforms each with a feature set that is sufficiently mutually incompatible to break one another. Add to that the options for including files into other libraries and the degrees of freedom get higher.

At the time of writing, WebMolKit v2.0 is on a separate branch on GitHub, but it will be folded into the master branch soon. The README file has details about how to try it out with code, starting with embedding a molecule sketcher onto a regular web page by importing the precompiled bundle, and another example showing how to create a new Electron project and including WebMolKit using the NPM package.

The combination of ES6 modules and regular updates via NPM should make the library useful for a lot more projects. It was perfectly possible to incorporate the older library into any other kind of JavaScript project, but it’s now a lot more natural and convenient.

The other noteworthy feature that I introduced is resolving line-crossings for pseudo 3D diagram blocks:

When bond lines cross each other, under certain circumstances the rendering algorithm will make a best guess effort to figure out which of them belongs on top, and disconnect the one that’s below. The algorithm uses several tricks to make this determination (see the file PseudoEmbedding.ts if you want to know more). The first step is to seed the atom layout with values for the Z-coordinate (which are honoured if available, but for most diagrams is zero for all atoms). A starting value for Z-coordinates is manufactured using up/down wedges, if any are available. If not, the algorithm looks for longer/shorter bonds being used to impart a sense of perspective, which you can see in the snapshot above. Failing that, it uses a relatively arbitrary method. The values for Z-coordinates are then smeared across each connected component, and then used to decide which bonds get chopped into two segments. The algorithm is quite simple and surprisingly effective.

Besides modules and rendering improvements, the new release also has a renewed emphasis on regression tests, which are still sparse, but have been extended. Functionality like rendering and reading/writing MDL formats is perpetually at risk of being broken by some improvement. One of the philosophical changes needed to operate in the NPM (node package manager) world is that it’s not very convenient to develop multiple libraries concurrently, and so it becomes much more important to have unit tests at the first point of contact. So that will become more prominent as the project matures.

Leave a comment