This last week, two stories about data sharing caught my eyes. And even thought they have emerged from just about 350 miles apart, the attitudes involved could not be further from each other.
The first story that caught my eye was the editors of the New England Journal of Medicine summarising what they think about data sharing. They talk about it like self-assured conscientious capitalists describe their idea of communism:
“The aerial view of the concept of data sharing is beautiful. […] The moral imperative to honor their collective sacrifice is the trump card that takes this trick. However, many of us who have actually conducted clinical research […] have concerns about the details.”
It’s a nice idea, but it will never work. And they even go a step further, identifying a new breed of prolific “research parasites” that may take over the research world, freeloading and good, hard-working
citizens scientists that just want to do their work .
Obviously Twitter took the satire potential and ran with it (really didn’t take that much effort to turn the phrase into some comedy gold).
The immediate and violent (as is common in the twitter age) backlash was noted by the editors and a quick update followed, basically just clarifying that “the Journal is committed to data sharing in the setting of clinical trials”. Because, you know, “there is a moral obligation to the people who volunteer to participate in these trials” .
Much could be said on this awkward move by “The Journal” in its own right: If you’re acknowledging a moral obligation to participants in trials, why not to participants in basic science research? Why does the same obligation not extend to ‘the public’ who in a very real sense invest in this research? Have we not learned enough from the bad scientific practices plaguing closed-data clinical trials to extend those considerations to basic science research?
But the story really comes alive for me when the very same week news broke that the Montreal Neurological Institute at McGill University decided to ‘go ‘open’ to accelerate science‘ . The reason? As Guy Rouleau eloquently puts it: “We’re doing a really shitty job” at translating neuroscience research into treatments. And he is right. There’s no way around it:
But doesn’t the New England crew have a point? Scientists (clinical and non-clinical) work hard and are often stuck in cycles of grant proposals that limit a kind of long term planning that would facilitate ‘playing the long game for science’. A lot of individual labour goes into the design and conduction of experiments, only a small proportion of which will ever result in exciting scientific breakthroughs. And if you don’t claim those breakthroughs for yourself, i.e. if you do not publish them before someone else does from the very data you collected, you’ll undoubtedly perish.
Clearly these are understandable concerns, and are more or less relevant for research scientists at different stages of their career. Yet there are whole fields of science that are unimaginable without pretty much all data being shared (think of astronomy, or particle physics). These fields still exist, and a reward system has adapted to make the work people do worthwhile – the ‘research parasites’ haven’t taken over completely.
Insightful and thorough as ever, Dorothy Bishop gave the topic her consideration a couple of times on her blog (which is worth checking out), including an excellent post on “Who is afraid of Open Data” last year . As part of the post she summarises (and responds to) 6 main concerns the opponents to data sharing cite:
- Lack of time to curate data; Data are only useful if they are understandable, and documenting a dataset adequately is a non-trivial task;
- Personal investment – sense of not wanting to give away data that had taken time and trouble to collect to other researchers who are perceived as freeloaders;
- Concerns about being scooped before the analysis is complete;
- Fear of errors being found in the data;
- Ethical concerns about confidentiality of personal data, especially in the context of clinical research;
- Possibility that others with a different agenda may misuse the data, e.g. perform selective analysis that misrepresented the findings;
I can think of rational responses to why none of these concerns should stand in the way of Data Sharing. Curating your data is beneficial not just for others, but also for yourself and your future use of the same dataset. Personal investment does not justify holding on to ‘your’ research, which was ultimately funded to “advance science”, not your own career. Being scooped is rare, and would be eliminated if the data themselves become published and citable. Errors in the data are bound to happen, but surely it can only be better to improve on them. Confidentiality requirements can be met if appropriate steps are taken. Misrepresentation is arguably a pathway into discussing your results and analysis within the scientific community.
But it feels that the argument between believers in open science, and those who are opposed to it runs deeper than pure rational thinking. The argument brought by the NEJM editors smells of ideology, the wording they use appears overly emotive if they were really just addressing the ‘details’ of data sharing.
A clue as to what might be going on comes from a frank comment to Dorothy Bishop’s blogpost mentioned above: Jim Grange (Psychology Lecturer at Keele University, who incidentally blogs very usefully about Bayesian statistics) comments:
I’ve made several excuses over recent years not to put my data freely online, but Number 4 [errors being found in the data] is my biggest fear.
This fear is real. Opening up your science is opening up yourself to public criticism and being challenged. ‘Misrepresenting’ your data might just mean unmasking a fatal flaw that you have not noticed yourself. Reanalysing your data, that you have so carefully analysed in a specific, well-thought out manner, may just lead to your high-impact publication being retracted. Overcoming this fear and committing yourself to Open Research (like Jim has) even though you are worried about the possible problems that might arise takes guts.
Really changing people’s practices will need to address this fear. We need role models who use the Open Science movement to push their science forward, improve their analysis, correct their errors and end up with better results because of it. The half-hearted suggestion from the NEJM editors to engage in carefully set up collaborations won’t cut it – the best input may come from people you didn’t even know were out there, particularly in fields that address complex interlinked problems, such as neuroscience.
Instead we need more institutions like the MNI, paving the way and supporting their researchers in it. We need more initiatives like the Digital Curation Centre; more large-scale research institutes committed to data sharing, such as the Allen Brain Institute; more projects centred around making widely usable datasets, such as the Human Connectome Project; more initiatives such as NeuroSynth (which plaid a lead role in cingulategate).
It feels like things are changing (but I might just be sampling biased opinions from the folks on twitter). Editorials challenging data sharing and Open Sciences signify that this is ‘a thing’ now. Feeling threatened is understandable, but should not warrant occupying indefensible decisions – clearly this is work in progress, and I myself still have far to go in this regard.
But here’s to the brave new world ahead.
 Longo DJ & Drazen J (2016) Data Sharing. NEJM 274:276-7. DOI: 10.1056/NEJMe1516564
 Drazen JM (2016) Data Sharing and the Journal. NEJM DOI: 10.1056/NEJMe1601087
 Owens B (2016) Montreal institute going ‘open’ to accelerate science. Science DOI: 10.1126/science.aae0265
 Bishop D (2015) Who is afraid of Open Data. deevybee.blogspot.co.uk/2015/11/whos-afraid-of-open-data.html, accessed 28/01/2016