Andy Dudfield, who works for “a small charity” called Full Fact, which checks out the claims made by politicians and others, said he had pitched this session because “fact checking is hard to do” and “we are always looking for ways to make it go faster.”
So, he said, “one of the things that we would like to do is to look at using open data for fact checking” and he would like his audience to tell him about any good open data sources for the job. And, also what the caveats were likely to be.
For example, he said, Full Fact might be asked to check a claim that crime had risen: were there good open data crime statistics, and how would the organisation know where they came from, and whether any change was ‘real’ or a statistical artefact?
Participants at this session at Open Data Camp 7 in London said checking the provenance of data was hard. As the very first session of the day had discussed, data should have meta data attached, but this didn’t always happen. “One of the things that we can do as a community is to open up our journals.”
There is a data model and data store called Provenance that tries to help: but it may not capture or, critically, explain changes in data collection methodologies. Although some of the data publishers in the room said they recognised the importance of doing this, and that they tried to provide guidance to data consumers when they published data sets.
Other participants named examples of good practice. The Office for National Statistics, for example, has stringent quality controls on the data it puts out. However, others said that, at the end of the day, there is no substitute for “human knowledge” when it comes to contextualising data.
“It takes years to get really familiar with these data sets and their wrinkles and to spot when things change,” one said, adding that the open data community had missed a trick in not finding a way to add this kind of “social” information to its data sets and products. Someone else suggested that just adding a phone number for queries would be a start.
Log the change you want to see
The issue that emerged from the session was “stewardship.” Organisations like Full Fact need to know that the open data sources they are looking at are well stewarded. Otherwise, they have to spend time checking out data before they can use it. This, another participant pointed out, would limit the use cases for a data set: it might be ok for academic research, where time isn’t an issue, but it wouldn’t be useful for fact checking.
The same participant suggested that Full Fact could help to change things by keeping a log, or a timesheet, of its own experience of trying to use data sets, so it could feed back on the issues it had come across. Andy said this idea was “interesting” although it was asking a lot of an already busy organisation to do it.
Conversely, data publishers pointed out that they might not know about issues unless they were told about them. “Those interventions could be really, really useful to people working in organisations who are constantly making the case for investing in open data,” one pointed out.
As another response, a camper suggested that Full Fact could at least record and publicise issues with large, well-known data sets that are used frequently, so that if it commented on them or on a use of them it could point back to this commentary. For example, Andy mentioned that different crime statistics defined ‘London’ in different ways, so it should be fairly straightforward to point that out, if a London politician made reference to one of more of them.
Meantime, there is, of course, an election on. And Andy said that if anyone at Open Data Camp 7 came across an interesting claim or way to check it, they could get in touch. Andy can also be followed on Twitter.