ODDT: web page crawling in search of images

The latest alpha testing version of ODDT (Open Drug Discovery Teams) has been enhanced to be considerably more graphical. The back-end server operates by regularly polling for predefined Twitter hashtags, and assimilating new content into its own stream. For several revisions, links that contain chemical data (structures, reactions, datasheets) have been recognised explicitly, and handled by the app, allowing the content to be previewed and used in conjunction with other apps. Now images are handled as well: tweets with links that go directly to images are recognised as such, and links that lead to HTML pages are downloaded and crawled, in search of references to embedded images.

As is shown on the screen capture to the right, some of the entries arranged on the page have images associated with them. These were fetched by the server, and the links incorporated into the associated factoid. The client app recognises these, and uses the information to build a set of thumbnails, which are displayed on the summary page.

Tapping on any of the thumbnails opens up a detail view that shows a list of individual pieces that make up the factoid:

Parsing and display of extracted images brings the ODDT project much closer to minimum viable product status, and the current feature set will likely be quite recognisable once the app moves into the beta phase. The current plan is to make the first beta version available to everyone by releasing it on the iTunes AppStore. If you are interested in evaluating the project before that happens, though, don’t hesitate to get in contact with us to join the alpha team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s