Green Lab Notebook: more on balancing & green metrics

gln_greensteps1Recent blog posts (here & here) have chronicled the experiment editor for the Green Lab Notebook (GLN) app-in-progress. The green chemistry metrics and some interesting stoichiometry/equivalents/balancing issues for multistep reactions have been grappled with.

Representing a multistep chemical reaction in a way that has all the necessary information to be algorithm-friendly and a precursor to a visual layout that is chemist-friendly is not quite as simple as adding another arrow.

One of the very nice housekeeping properties of a single step reaction is that it is possible to lay down the law about stoichiometry: molecules that are part of the formal balancing equation must be classified as reactants or products and their stoichiometry (1, 2, ½, etc.) must be correct; non-stoichiometric ingredients such as catalysts and solvents, or reagents used in excess when the reaction balance is not clear, are classified as reagents, which are drawn above & below the reaction arrow.

This rule stops being reasonable when considering stepwise reactions. For example, dibromination of toluene:

gln_greensteps3

The primary ingredient (toluene) is represented as a reactant. Two stoichiometric equivalents of bromine are used in the reaction. One way to draw this would be to include all of the stoichiometric reagents in the first step, i.e. CH3C5H6 + 2 Br2, and then draw the subsequent brominations as they happen. This might be OK for an algorithm that concerns itself with the reaction balance, but to a chemist it’s ugly, and not what we want to see. Perhaps a physical chemist studying mechanism would find this agreeable, but as a former synthetic chemist building a product for current synthetic chemists, what we want to see is a starting material being operated on in a sequence of transforms.

First of all, note the additional symbols that are used in the diagram above: the starting material has a circled asterisk to the top right, which indicates that it is the primary reactant (which is redundant in this case because there’s just one). In the two product steps, the byproduct is hydrogen bromide: both of these are further annotated as waste (borrowed from electrical circuit iconography for grounding), which is a user-provided flag used to indicate that it’s not one of the intended products. This is a key part of the balancing and classification system, especially for multistep reactions, because any product that is not waste is considered to be a reactant for the next step.

Observe the quantity table for the two-step bromination, where the quantities and ratios all work out:

gln_greensteps4

So far so good. Both the quantities for the bromine reagent are entered manually. Note that the bromine reagents have a field called Equiv (for number of equivalents, i.e. a ratio), just like the reactants and products do. More on this in a moment.

For the first step, bromine can be represented as a stoichiometric reactant indicated alongside the starting material, or it can be drawn as a reagent above the arrow; for the second step and second equivalent of bromine, though, drawing it as a reagent is the most unambiguous choice: if it were drawn alongside the first product, it would be hard to tell whether it was a byproduct, or an extra reactant. This is all well and good, except that bromine is formally stoichiometric, and so it needs to be incorporated into the balancing equation; reagents are supposed to be unstoichiometric, and although this rule may be considered arbitrary, I’d like to push it as far as it can go, but that means the reaction can’t be balanced. The handy little reminder pops up down the bottom, specifying which atoms are “missing” from the left hand side:

gln_greensteps6

One solution would be to add a stoichiometry property for reagents, same as for reactants and products, and simply defaulting to 0 (for nonstoichiometric). I seriously considered this option, and then stayed my hand because of another feature: for nonstoichiometric reagents such as catalysts, it is useful to be able to explicitly enter a ratio (“equivalents”), e.g. 5% by moles. Armed with this information, when the user specifies a quantity for the primary reactant, the software is able to calculate the amount of the reagent is required to satisfy this; then later if the user clones the experiment and changes the amount of primary reagent, the appropriate quantity of catalyst can be recalculated automatically, since it’s based on a ratio.

For reactants and products, stoichiometry and equivalents are defined to be the same thing; but for reagents they are not: a 5% molar ratio of catalyst does not factor into the reaction balancing equation; if the reagent has a stoichiometry of 1, though, then it does. Since stoichiometry and equivalents are different properties, one solution would be to allow the user to specify both of them, or add a flag to indicate which meaning is intended. This may end up being the solution, but it is a bit confusing from the user experience perspective, so for the moment I’m pursuing a different approach: using implied stoichiometry for reagents.

Rather than having the user explicitly provide stoichiometry for reagents, it is determined by looking at atom-to-atom mapping information, if available. In the bromination example used here, the intermediate product has its bromo substituent labelled as atom#1, and for the final product, the ortho substituent is labelled as atom#2. For each of the reagent structures, a bromine atom is labelled #1 and #2, respectively. This allows the reaction balancing algorithm to examine the mappings and realise that the bromo atoms are being incorporated into the final product: therefore, the reagent is not a catalyst, or a solvent, or some weird thing, it is a first class stoichiometric equivalent, even if the actual quantity is used in excess. And hence it should be used in the balancing equation, and hence the conclusion reached by the balancing formula is that all atoms are accounted for, nothing is left over.

At the present time, there is no visual indication of this mapping; the user interface for providing atom mapping numbers is quite well buried inside the sketcher, where I would not expect many people to even find it, let alone use it regularly. But that’s temporary, because the bigger picture is that having atom-to-atom mapping for reactions is incredibly useful. In fact I would go so far as to say it’s actually essential: once you have the correct mapping between atoms of a perfectly balanced reaction, an algorithm can do almost anything with it. It is not possible for a computerised method to determine reaction mapping flawlessly, and even the best attempts have a high error rate: like many problems in cheminformatics, 80% of the time it’s easy, 15% of the time it’s hard, and 5% of the time it’s impossible. But if you have the privilege of asking the user “is this right?” and “if not, give me a clue”, then suddenly those numbers become very favourable. Often all it takes is for a user to provide one atom-to-atom reference point, and everything else crystallises out perfectly, and so the solution is to combine automatic calculation with a user interface that makes it easy. This is the plan for the Green Lab Notebook app: providing atom-to-atom mapping is going to be a routine part of the data entry process. In order to avoid scaring away almost all users, this is going to need to be done in a near frictionless way, while also demonstrating the advantages. Implied stoichiometry of reagents/avoiding an additional confusing data entry field is a small one, but later on when reaction transforms make it onto the chopping block, this will be a really big deal.

Last but not least, being able to understand the classification, stoichiometry, balance and quantities of the steps allows the green chemistry metrics to be worked out for individual steps:

gln_greensteps5

In this case, each of the non-waste products gets its own ratings for process mass intensity (PMI), E-factor and atom-efficiency: this is all made possible because there is enough information in the reaction encoding for the algorithm to figure out what is what, how much and where.

Try doing that with a freeform canvas reaction sketch.

One thought on “Green Lab Notebook: more on balancing & green metrics

Leave a Reply to chembioinfo Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s