ODDT: web page crawling in search of images

The latest alpha testing version of ODDT (Open Drug Discovery Teams) has been enhanced to be considerably more graphical. The back-end server operates by regularly polling for predefined Twitter hashtags, and assimilating new content into its own stream. For several revisions, links that contain chemical data (structures, reactions, datasheets) have been recognised explicitly, and handled by the app, allowing the content to be previewed and used in conjunction with other apps. Now images are handled as well: tweets with links that go directly to images are recognised as such, and links that lead to HTML pages are downloaded and crawled, in search of references to embedded images.

Continue reading