Data sources – how to find them

Alex McCutcheon – Valtech.
Gozde Karahan – PhD student, Turkey

Gozde explained that as part of her PhD she has been looking for data. In Turkey, and some other countries, it is easy. In the UK, not so much.

Alex explained that he had worked on a project for Glasgow Chamber of Commerce, which after the Covid-19 pandemic, wanted to know how well and how fast the city was recovering.

He said it was able to pull together a lot of data on traffic, and the number of buses running, and footfall in the city centre. But it wasn’t easy. “It was search the web using Google to find data sources and work out what was reliable,” he said. “It was very time-consuming, and I think it should be easier to find out what there is to use, and where it is, and what format it is in.”

LGA Inform Plus – a good starting point?

Martin Howitt said the Local Government Association has a data portal – LG Inform Plus – which is semi-open, in that it can be searched and a certain amount of data can be downloaded for free via an API.

However, he acknolwedged there are limitations to it. It doesn’t go down to address level, for example. So it’s not possible to enter an address and find out information like: what air quality is like, or how many trees there are, or what the index of multiple deprivation looks like at that level.

Gozde asked where all the data comes from. Martin said different places: the census, central government, DEFRA, the Department of Health, Public Health England. “There are 20,000 data sources, and there are issues, but it seems like a pretty good starting point,” he said. However, Alex asked how researchers like Gozde would find it. And Martin admitted that it was set up for people working in local authorities, who are LGA members, so it’s not super-easy, although it is accessible through web search.

The only constant is change (and that’s a problem)

Another researcher pointed out that even on portals, data formats can be very different and data sets can be out of date. He said he is working on a portal for Oxford, which also wants to give residents information about things like traffic and footfall, but “some [government] data sets are from 2022 or 2023” – and the way they are coded can change over time.

The project has been exploring other sources, such as Strava information, although it has limitations, in that it’s used by a self-selecting group. Alex said Glasgow was lucky that it had already invested in censors around the city to capture movement information. Although Martin said this wasn’t a complete solution – “there’s a surprisingly high attrition rate, as people drive bicycles into them, and all kinds of things.”

And the Oxford researcher said there can be issues with using this kind of information on a real-time portal, because some censors will report constantly, and some will only update information occasionally.

Alex and other speakers said changes to data are also a significant problem. “Often,” one speaker said, “there are good reasons for the changes” but it can still be a major piece of engineering work to accommodate them. At the very least, she suggested, data publishers could inform people properly that a change is coming. “Often, something just goes out on Twitter.”

Martin said LGA Inform Plus people to register; which makes it easier to communicate when something significant, “like boundary changes, which happen every time there’s a change of government’, comes along.

Is there a solution?

Towards the end of the session, Alex asked campers where they went for data. And like the people who had already spoken, most said they spent a lot of time on searchers, going to individual departments, and finding that “some are good” and some aren’t. The ONS has tried to create an integrated data platform, but given the dis-aggregated nature of data collection and publication in the UK, it’s been a long, tough job.

Plus: it’s still hard to make it work for everyone from data scientists to occasional users, such as journalists or people who just want to know something about their area. The government has tried running “bootcamps” to train more people as data scientists: but not everyone wants to be a data scientist; and some people might not need to be, if the data owners thought more carefully about their data and the costs and benefits of publishing it in accessible formats.

Alex joked that a data scientist is like the driver of an F1 car: they rely on a huge team of managers and mechanics. And in some ways, it’s the managers and mechanics who are keen to open data success.

One thought on “Data sources – how to find them

Leave a Reply

Your email address will not be published. Required fields are marked *