Unlocking the Value of Research Data
We’ve all been there. When reading a paper with interesting results, questions arise about some detail of the experiment. We go to the methods section for a quick check, only to find ourselves directed to an old paper from an obscure journal published a decade ago. All too often, we give up without the answer to our question and end up simply disregarding the results of the paper. Time wasted, nothing learned.
Now imagine that if we had been able to integrate the results from that paper with other findings and it would have led to a breakthrough discovery. Perhaps it would have revealed the role of a new target in disease biology, uncovered a mechanism of drug toxicity, or helped develop a machine learning model for predicting some clinical outcome. Opportunity lost.
Despite the hype around big data in biomedicine, most biomedical research remains locked away, inaccessible to researchers. In addition to inadequate method descriptions, often results are available only as final images, buried in pdfs, or in inaccessible supplementary tables. We’ve discussed previously how researchers can make their data more available – using data science friendly table formats and including metadata in companion files (see Getting Started with FAIR). But how do we address the need for better descriptions of assay methods? Imagine if we could easily integrate data sets from multiple publications together and analyze results by targets, pathways, cell types or detection technologies. Wouldn’t this be nice?
One group who is taking on the challenge of improving the annotation of assay descriptions is the Pistoia Alliancethrough their whimsically named Data FAIRy project. They have recognized that consistent and accurate assay descriptions ensure that experimental results are understood and correctly interpreted. Assay descriptions that are standardized and machine readable can facilitate data reuse, unlocking the value of past results.
The Pistoia Alliance is a non-profit organization working with member companies on non-competitive aspects of drug discovery. This group focuses on facilitating the development and adoption of open, accessible data standards, taxonomies, ontologies and web-service descriptions. The collaborative project led by Data FAIRy team members from AstraZeneca, BMS, Novartis, Roche and CDD is working to make bioassay information available in a FAIR format. This group has been working on a feasibility study and recently gave an update at the Pistoia Alliance Conference. In their feasibility study, CDD’s BioAssay Express, a web-based tool for annotating bioassay protocols using semantic web terms, was applied to 496 assays collected from PubChem, ThermoFisher, and the literature.
The automated annotation tool worked quite well, although some manual curation was still needed. Annotation issues differed depending the source of the assay descriptions. In published literature, information was often difficult to find. This required having to review supplementary materials or look up cited publications. Errors were found to propagate between papers and many articles were not open access. While commercial assay panels were often more straightforward to annotate, in many cases panels did not have persistent links for their protocols.
What’s next?
The Data FAIRy team is planning to optimize the process and work with publishers on community standards for assay reporting and publishing. With so many interested stakeholders, one of the challenges will be to manage the scope of this effort and prioritize the most important use cases. Interested parties include the scientists developing and running the assays, analysts using the data, project funders, journal publishers and their readership, and of course the wider scientific community.
As advocates for improving access to research data, we support efforts to facilitate data analysis across assays and publications. We are particularly interested in large-scale perturbation studies using chemical probes and drugs to gain insights on disease and toxicity mechanisms. For this use case, assay annotations that facilitate connection of assays each other (e.g., measurement information, cell types, targets, pathways, etc.) would be highly valuable.
Another high value use case involves regulatory groups managing new drug and product registrations. These groups are tasked with reviewing data from a growing and diverse array of in vitro and phenotypic assay technologies. Standardized assay descriptions, and machine readable formats will make it easier for regulators to review these data and understand the different assay platforms. Better understanding of these data will help encourage the adoption of human-based animal alternatives for drug and product testing.
What can you do?
Participate if you can, pay attention to these standards as they develop. Look into your own habits and processes and work to better describe your materials and methods. The Cell Journals have their Star Methods and the Nature family of journals has also started making some recommendations for methods and resource identification.
Photo by Dimitry Anikin on Unsplash.