BioAssay Express: converting annotations into prose

The BioAssay Express project is about describing bioassay protocols using machine readable annotations (which are URIs that have been appropriated from semantic web dictionaries). Because almost all currently existing bioassay protocols are represented as text, much of the focus has been on finding ways to streamline the annotation process. Thinking forward to the future, however, we anticipate that once this technology is widely deployed, scientists will find it easier to annotate new protocols using our templates and web-based interface than to write up many pages of prose using a wordprocessor.

For this reason, we are experimenting with running the process in reverse: converting the nicely structured semantic web annotations to scientific English, so that the biologist doesn’t have to do.

Consider the following example on the beta server, corresponding to PubChem Assay ID 743096:

bae_translit1

The record was imported from PubChem with essentially just the text description (shown on the left). On the right is shown the various fields from the Common Assay Template, which were initially all blank, until our biology team made use of their own expertise, with a bit of help from natural language/machine learning, to assign these fields, using terms from publicly available ontologies such as the BioAssay Ontology, Drug Target Ontology, Cell Line Ontology, and others.

So far so good, but what if this was a freshly conducted experiment, where the scientist had filled out all of the fields, but not gotten around to writing up the text? Now there’s a feature that can help:

bae_translit2

The text shown above is generated by clicking a button, which contacts the server to have it apply a boilerplate set of instructions that can turn annotations into text. As you can see, the quality of English would not win any literary prizes, but with a few relatively simple grammatical rules, the content can be composed remarkably simply.

In fact, here is the recipe for generating the text content from the assay annotations:

{
    “schemaURI”: [“http://www.bioassayontology.org/bas#”%5D,
    “boilerplate”:
    [
        “<p>”,
        “Auto Annotation to Text for PubChem Assay “,
        {“field”: “uniqueID”},
        “</p>”,
        “<p>”,
        “This is “,
        {
            “ifany”: “bao:BAO_0000210”,
            “then”: {“term”: “bao:BAO_0000210”, “article”: “indefinite”, “style”: “italic”},
            “else”: “an <i>unknown stage assay</i>”
        },
        ” to identify potential treatments for “,
        {“term”: “bao:BAO_0002848”, “style”: “bold”, “empty”: “an unknown disease”},
        “, by investigating the biological process of “,
        {“term”: “bao:BAO_0002009”, “style”: “bold”, “empty”: “unknown”},
        “, specifically targeting “,
        {“term”: “bao:BAO_0000211”, “style”: “bold”, “empty”: “unknown target”},
        ” from “,
        {“term”: “bao:BAO_0002921”, “style”: “bold”, “empty”: “unknown organism”},
        “.</p>”,
        “<p>”,
        “This is a “,
        {“terms”: [[“bao:BAO_0002854”], [“bao:BAO_0002855”]], “sep”: “/”, “style”: “bold”, “empty”: “?”},
        ” in “,
        {“term”: “bao:BAO_0000205”, “style”: “italic”, “article”: “indefinite”},
        “, using “,
        {“term”: “bao:BAO_0095009”, “style”: “italic”, “article”: “indefinite”},
        {
            “ifany”: “bao:BAO_0002663”,
            “then”:
            {
                “ifbranch”: [“bat:Absence”, “bao:BAO_0002663”],
                “else”:
                [
                    “, with the assay kit “,
                    {“term”: “bao:BAO_0002663”, “style”: “italic”}
                ]
            },
            “else”: “, without an assay kit”
        },
        “. “,
        {
            “ifany”: “bao:BAO_0002800”,
            “then”:
            [
                “The cell line “,
                {“term”: “bao:BAO_0002800”, “style”: “italic”},
                ” was used. “
            ]
        },
        “It was conducted in “,
        {“term”: “bao:BAO_0002867”, “style”: “italic”, “plural”: true},
        “, with “,
        {
            “ifany”: “bao:BAO_0000207”,
            “then”:
            [
                “the detection method of “,
                {“term”: “bao:BAO_0000207”, “style”: “italic”}
            ],
            “else”: “an unknown physical detection method”
        },
        “, using “,
        {
            “ifany”: “bao:BAO_0002865”,
            “then”:
            [
                {“term”: “bao:BAO_0002865”, “style”: “italic”, “article”: “indefinite”}
            ],
            “else”: “an unknown detection instrument”
        },
        “. Results are reported as “,
        {“term”: “bao:BAO_0000208”, “style”: “italic”, “empty”: “unknown”},
        “, in units of “,
        {“term”: “bao:BAO_0002874”, “style”: “italic”, “empty”: “unknown”},
        “.”,
        ” This assay tested the mode of action of “,
        {“term”: “bao:BAO_0000196”, “style”: “italic”, “empty”: “unknown”},
        ” by “,
        {“term”: “bao:BAO_0000185”, “style”: “italic”, “empty”: “unknown”},
        ” perturbagens.”,
        “</p>”
    ]
}

This is a very early view of the feature, which will no doubt evolve rapidly in the coming months, but so far it is looking quite promising.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s