What can open data and data science bring to government?

“Open data is for government. Open data is for activists. Data science is more serious.”

This is a fake dichotomy, and we need to deal with it. There’s no point in having open data if you can’t analyse it. We all want to get answers to people. And we need to connect the skills better. There’s no good delivering the best analysis in the world if it arrives two days after the decision was made.


Data science has to prove that we can deliver timely analysis based on open data that impacts on decision making.

What types are data are needed for policy making? We’re not doing a good job as either community in providing that yet, suggested some people. And there’s a reluctance for people to move into spheres like this which are seen as out of their remit.

However, we can analyse historic patterns and put them in front of policy makers. Beyond that, we could ask new questions of the data – that’s the promise of data science, a way of working around our own preconceptions about the data.

But that could be too late. We need the policymakers involved in framing the questions – because most policy is framed in a few weeks, in response to an event, not over years. Perhaps there’s two uses here: responding to policy crises, and triggering new policy discussions from the data.

Data science: more than statistics

There’s an existing tendency to boil data science down to statistics in policy making – and it’s far more than that. What the open data community can offer is that we exist outside the silos of government. We can merge and mix data to find new insights. Data science bring the possibility of working from the problem back to solutions, rather than fitting the tools to the problem.

Generally, levels of data literacy are pretty low. There’s a useful model of data maturity that helps people understand. It starts with people studying Excel spreadsheets of historic data, passes through real time data analysis, and end up with predictive analysis modelling. So, there’s a learning curve before you can get involved in the best decisions.

Finding simple answers in complexity

The answers people are seeking can be incredibly complex to actually deliver – but sometimes we need to strive towards a “yes” or “no” answer. The more we strive towards providing answers, the easier it is for external fact checking agencies to check that politicians aren’t misusing or distorting the data in policy discussions. The more we can see of how you worked, the more transparent it is – and the more others can learn from your methodology.

The FSA search Twitter for people mentioning being sick, and they can model from that the likelihood of norovirus outbreaks. That’s an example of data science at work, that can feed into planning in the NHS – and possibly policy.

Perhaps some of the most promising areas of collaborations are around sharing data cleaning techniques, sharing the route code of tools built; anything that stops everyone having to start from scratch every time they want to analyse a data set. However, this could be limited by differing views: one man’s “clean data” is another man’s “dirty data”. Confusion can arise if things are cleaned in the wrong way – so perhaps we need to move cleaning up the data supply stream.

Skills sharing would be a useful approach. We need something between the low volume, high intensity of apprenticeships, or the high volume, low intensity of massive online courses. A halfway house feels like a more cost-effective method. But literacy is probably more important. Not everyone needs to run a python script, but everyone needs to understand what can be done with data. We need a David Attenborough of data.

Session notes