Can we — and should we — free up more research data as open data? A Open Data Camp 6 panel addressed this head on.
One attendee has working with data about rocks rolling down rivers – there are platforms like FigShare that people use, that are more document management. There’s also a reluctance to publish raw data rather than process data, which is much less useful. There’s a huge amount of opportunity here, as open research is something people are just not doing.
Why not publish? Many people felt it wasn’t their job — which could be a cover for something else. Is it their IP? Do they have an institutional policy? Would they rather someone else went first? There can be fears about data quality – “they’ll see our mistakes and we’ll burn in data hell forever”.
There is an open data stream within research and academia. People have predicted this data themselves, and don’t want to share it, because research publishing is extremely competitive. One attendee suggested that was exactly right – they are terrified of people stealing their results and doing something better with them. Academia does not encourage collaboration.
This is counter-productive, and certainly impacts reproducibility, which is fundamental to science.
There’s a whole community developing in Open Access Research Data, though, that don’t necessarily think of themselves as Open Data people, but effectively are. There’s a Royal Society grant been made for modularising the publishing process, so you can publish at each stage – hypothesis, raw data, etc.
However, Open Access Research Data may not often meet the condition to be proper Open Data. There’s a funding aspect as well, around the costs of Open Access publishing. Government grants tends to have funding building in for such publishing. There’s less incentive if you’ve funded it yourself.
There’s a section on the University of Aberdeen website about this.
Data beyond STEM
Remember – STEM subjects aren’t the only ones relevant in these conversations. The content of interviews, for example, can be a kind of data. And there are some interesting privacy issues around that sort of information.
Storytelling is one critical way of getting people interested – and graphs and visualising are a form of storytelling.
There are some complicated access issues around some research data, for example. The data that Facebook is opening up for researchers is behind multiple layers of access control, for example. How about images of research material that is, in of itself, copyrighted. There’s more than just access to think about here, bit what you can do with what you get access to.
Is this a political problem? Are there contractural issues here? Yes. Clearly, there are vested interests in keeping research private. Funders could be doing a lot more to make the data they pat to have created disseminated – but that could turn the existing academic funding model on its head.
Do we, perhaps, need more nuanced labelling (and licensing) of data? Almost like health labelling of food?
The UK Data Service might have some useful resources for researches – including things around intellectual property rights that might be applicable to other domains.
How about the consumer perspective? How do you turn research into impact? Some people have to rely on “cousin” access to get hold of research from journals.
SciHub is trying to make more research accessible. But a lack of awareness of what is out there already is going to limit what you can actually achieve. Once a paper is peer-reviewed, the copyright moves to the journal. But people can publish pre-print versions of the data, lacking the peer review improvement.
In some cases, the idea that their data would be opened might discourage people from joining research studies, especially if they’re sceptical of anonymisation. But that might be a very small segment of people.
Driving the change
In Government, people with power and money have driven the releasing of data, and backed it with legislation. Unless there’s a push from the Russell Group, or a group of vice-chancellors, or even funders, we won’t see the same from academia.
Most UK RCs have data policies that require open data publishing in RC data centres as condition of funding. Policing to ensure this is done does not really exist though. But lots of great practice out there. @CEH_EIDC a great example. Maybe build on, not up end?
— Matt Fry (@mattfry_ceh) November 3, 2018
There is a tension between privacy and openness, though. It’s not intrinsic, and is constantly evolving, and it’s not a given that will always be the case. For example, how to you square the original GDPR consent if people do unexpected with open data. Can people really knowingly consent to something they can’t imagine? It’s also worth noting consent is not the only justification under GDPR.
These are complex issues, and aren’t to be ignored, however unlikely a prosecution is. Public authorities can’t really be seen to be ignoring the law… We might need some case law around this before it’s resolved, but nobody wants to be the one who gets sued first.
The public sector has managed to publish some anonymised data as open data.