Currently there’s a new feature in the works that’s going to make searching for compounds more convenient. The internal name is “MetaSearch”, because it’s a webservice that layers itself on top of existing search engines, and pulls together the best content from each of them and does a variety of additional processing.
As can be seen from the collage to the right, the new feature will first be made available in the Mobile Molecular DataSheet app for iOS, and will appear in the version 1.4 release.
The integrated “MetaSearch” feature complements the webservices feature that has long since allowed MMDS to search databases such as ChEBI and PubChem. The difference is that the new search feature does not use the general purpose webservices protocol, rather it uses a single specific service, built on top of the com.mmi software stack (a Java-based cheminformatics library) and hosted on molsync.com.
There are two workflows illustrated in the sequences shown above:
- Hit the Run Search button, type in the name of the structure, then pick the result and bring it into the scratch sheet, or copy the structure onto the clipboard. It’s an as-convenient-as-possible way to avoid having to draw a structure if you happen to know a textual identifier to locate it in a database.
- Touch-and-hold on an existing structure to bring up the menu, and selected Search. Execute the search to find exact/substructure/similar structures. Once they are located, either import or copy individual structures, or check off the structures of interest, and import those as a new datasheet.
There are other workflow combinations, but these illustrate the basic objectives: setting up a search should be really easy, as should be doing something with the results. The sequence follows the usual pattern: setup search, browse list, show details. The list and detail views give credit where credit is due: currently the PubChem and ChEBI search engines are used to provide the underlying muscle (because they are both open, free, popular, and really useful), and the icon glyphs show where each result was sourced from (multiple sources, in some cases). Also, the compound identifiers can be used to formulate a link to the web page that hosts the source data, which can be opened from within the detail view by tapping on the hyperlink.
From a technology standpoint, the service that brings together these search engines under its own wrapper is of modest accomplishment right now, but that will evolve. Its most interesting feature is that it issues searches to each of the source engines separately, using a separate thread for each. As results come in, they are spliced into the main feed, and normalised. For example, if both ChEBI and PubChem find the same compound, the record is merged. The merging/sorting/prioritising code is all generalised, so adding a new source engine is quite straightforward: the implementation just has to intermediate by reformulating the inputs and outputs into a data structure that can be processed by the framework.
The benefits of simultaneously accessing multiple search engines may or may not be self evident, but it’s definitely real. For example, PubChem and ChEBI are both excellent in their own way. PubChem has a lot more content, and makes an attempt to include chemistry that is outside of the drug-like molecule scope. ChEBI has less content, but tends to be somewhat well curated, and is a great resource for organic compounds. Both of these search engines have similar searching features (text, exact structure, substructure, structural similiarity), but they work differently. Other engines that will be added later may have different feature sets. The “MetaSearch” wrapper makes an effort to harmonise the results that come back. You might be surprised at how much difference the implementation details make to the kinds of compound lists that are generated.
And there are also issues regarding representations of structures. For example, PubChem results expand out the hydrogen atoms, which is a smart choice from a cheminformatics perspective, but rather annoying for the end user who usually doesn’t want that in a sketch. Since MMDS uses the SketchEl format to represent structures, it is safe to subsume hydrogen atoms without breaking the chemistry, and so this is done. It also normalises the bond lengths at the same time, so the sketches are compatible with the rest of the app. These kinds of details are rather important when it comes to getting real work done.
The new searching feature should be released for MMDS quite soon, with the searching capability being somewhat of a minimum viable product: there are some technical to-do items that are not showstoppers. For example, when two search engines return the same structure, the sketches are usually not identical, so which to use? I have some ideas on how to do this, but I know from previous work that it’s moderately complex, and not to be taken on lightly. For now, it just picks the first one. When it comes to substructures, it will also be upgraded to highlight the substructure match, which sounds easy enough, but actually it’s not: there are often multiple matches, and when that happens, which one to use? Probably should pick the one that most resembles the input structure. Then the result should be oriented the same way, too, so that it’s much easier to perceive the difference in chemistry. And what if the substructure matches a different resonance form/tautomer/protonation state? That’s also pretty interesting, and useful, but needless to say non-trivial. And for similarity searching… it would be great to have the user not need to specify a Tanimoto coefficient cutoff, and just have the search service do the right thing. I have some ideas about that, too.
The next app to get this feature will likely be MolPrime+ for iOS, possibly followed by the Android version. After that, work will begin on adapting a special purpose use case for the SAR Table app. This will allow decorated scaffolds to be searched, and the results to be deconvoluted and adapted to the assignments (e.g. R1=methyl, R2=phenyl, etc.), similar to the way the scaffold matching service currently works. So the workflow will be: draw or select scaffold, with a couple of taps search the source databases, select the matches you want, and import new rows into the table… all with neatly assigned scaffold/R1/R2/etc.
Stay tuned for MMDS v1.4, it all starts there.