Open data: concerns and opportunities

According to The JAMA Network, data from clinical trials are shared for two “principle purposes”: first, to verify original analyses, and second, to generate hypotheses.[1] Access to data is thus very important to researchers, and yet a recent study by Elsevier and Leiden University has suggested that one third of researchers are yet to share data, and 11% are unwilling to do so entirely.[2]

This article will discuss the opportunities that open data provides and the concerns surrounding it, including:

1.       Real-world impact and usage

2.       Impact on further research

3.       Current practice

1. Real-world impact and usage

Access to data can have very positive implications; for example, access to data regarding rare medical conditions can save patient lives, as reported by Nature.[3] It is evident that relying on the data of others can save precious time when treating illnesses and, similarly, collaborative efforts may expedite advances in other fields, where quick and decisive action is sometimes needed, such as environment and wildlife conservation.

On the other hand, open access to data from clinical trials of course raises concerns over patient privacy.[4] Although open access to medical research is potentially beneficial for treating patients all over the world, individuals may face a lack of privacy as a result.

2. Impact on further research

With particular regard to data from clinical trials, data sharing “has the potential to advance scientific discovery, improve clinical care, and increase knowledge gained from data collected in these trials”.[5] A lot of further research can be inspired by existing datasets, which may form an important basis for work. In a study published in MEDLINE and described in The JAMA Network, it was found that raw data is often re-analyzed and alternative conclusions are made.[6] If the data are not openly shared, new and important findings could be considerably delayed, and errors could also be perpetuated.

However, there are concerns over how shared data can be used. In many cases, a follow-up study may be planned by the original authors, but they are pre-empted by others. In a New England Journal of Medicine (NEJM) article, concerns were raised over a new generation of “research parasites”, who seek to use or simply undermine others’ data without conducting their own research experiments or clinical trials to create their own datasets.[7] Furthermore, the data could be misinterpreted by other researchers: “someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters”, leading to the data being incorrectly branded as irreproducible.[8]

3. Current practice

In the study described in the Times Higher Education (THE), only 13% of researchers currently publish their data in data repositories, which are more accessible than appendices to articles.[9] Again, reasons for not sharing data included “privacy concerns, ethical issues, and intellectual property rights”, while others said that they “do not like the idea that others might abuse or misinterpret their data, let alone take credit for it”.[10]

However, it reported that 37% of researchers do not feel incentivized to publish their data, while 41% do not feel that they have been sufficiently trained to do so.[11]


In NEJM, a system of collaboration with authors of datasets was proposed to avoid the “parasitic” practice of using others’ data without setting up original trials.[12] In cases of medical emergencies, another solution may be to make data available to medical practitioners for diagnoses but not for publishing purposes.

As quoted in Forbes, Isaac Newton’s famous words, “If I have seen further, it is by standing on the shoulders of giants”, seem particularly relevant to the current debate over open data.[13] In a world where incredible and fast-paced advances are being made, there must be a system in place to reanalyze data to ensure it is reliable. Furthermore, access to data can lead to further important scientific discoveries and, as Newton pays tributes to the “giants”, credit must be paid to those whose data was shared and used.

To combat researchers’ reported “confusion” over how to make their data openly accessible, the THE proposes training for young researchers as we transition to a new era in scientific research.[14]

