How do you prove the value of open data?
Here’s how.
The Food Standards Agency’s Food Hygiene Rating Schemes data is released as open data in near real-time, and the Department for the Economy in Northern Ireland found a use for it.
Like every authority in the country, Belfast has a ratings shortfall – there are business rates that should be being collected, but aren’t for various reasons. And a bunch of smart people across various parts of the government and city council had a feeling that they could use datasets to improve the collection rate within the city.
To test the theory, the Small Business Research Initiative, which one of those people, Eoin McFadden of the Department for the Economy, describes as “public procurement of R&D to fix wicked issues”, invested in four early stage proof of concept projects that could solve the problem of identifying formerly empty premises that were back in use.
One project used footfall data, another wifi and bluetooth signals, but the other two applied machine learning techniques to a range of public datasets, both open and closed. The project, which ran from July 2016 to March 2017, ran in two stages – and the two machine learning project made it through to the second phase.
Massive return on investment
Over the two week test period, they identified around £350,000 of uncollected rates – from a total project cost of £130,000. If you want to prove the benefit of open data, money talks…
So, how did the FHRS open data contribute to this?
Well, the normal method of checking for missing rates was to target high value empty properties, and manually inspect them. That had a success rate of around 20%. But what if you could work out which properties were likely to be occupied, and target your inspections on them?
The machine learning projects used a mix of closed public datasets, including the ratings data and water rates, and two open data sets:
- FHRS
- Companies House data
All those data sets are good indicators of a property in use, but matching them is hard. Hence the application of machine learning derived fuzzy logic to identify properties which are likely to be back in use.
Making it work
There was a fair amount of persuasion needed to make this work. Public bodies needed to be persuaded to provide closed data sets to private companies, under tight non-disclosure agreements, but the effect was remarkable.
Once the systems were up and running, inspection visits based on a list of probably occupation had a 51% success rate – about 2.5 times the average before. Given that rates make up 75% of the councils’ revenue, the protectional for economic benefit from these systems has only just begun.
The Food Hygiene data is valuable in this, both because it is updated in near real time (and updates hit daily), and because it indicates businesses that have begun trading associated with an address. A month’s delay in publication would make it significantly less useful to the project.
Of course, it’s not only hospitality and food businesses that get missed – two examples of business that got found were a hostel and, to everyone’s amusement, a firm of accountants.
Future revenue potential
The two successful projects – one based in Belfast and one in Southampton – are both continuing to develop out their systems, and Belfast is in the process of going forwards with a procurement process to make this a routine part of their ratings work.
That’s a pretty good return on some open data and £130,000 of up-front investment.
Don’t let anyone tell you that open data successes are only small scale and personal…
Hello,
This is a very interesting finding, and a great use of data that is already openly available!
I am interested to know what data tools you used to analyse the information? i.e., Socrata, Qlik, etc.
Thanks