All posts by Zachary Boren

Open Addresses: Multiplying addresses

The team behind Open Addresses addressed Open Data Camp for the second time on Saturday in the day’s final session.

The ongoing-development of an enormous database of every address the UK, designed to be integrated into the National Information Infrastructure, began by introducing a couple of tricks they build upon their address-data skeleton.

They are using existing datasets not otherwise used for this purpose, such as those provided by Companies House. These sets can be extrapolated, with further addresses hinted in amongst the data.

Ze remote problem

The ‘Ze’ postcode in Scotland’s Shetland Islands has called the “black sheep” dataset since its publication in December as it only offers 117 addresses. It’s an example of how sparse house data can be, and demonstrates how particularly difficult it is to cover remote areas; big cities like London have more readily available address information.

On Fogralea road in ZE1, for instance, there are only two houses: number 5 and number 30.

Open Addresses has figured out a way of make an informed guess how the rest of the street looks.

They call it ‘inferring’ and it helps place houses 6 through 29.

Above or below those numbers, however, nothing can be done because the risk of getting the postcode wrong is too great; it’s better not have an address than to send someone to the wrong place.

How can you help?

This inferring process is supported by the organisation’s crowdsourcing efforts. The Open Address mobile app asks users to “help us improve the UK’s address data” by either confirming an address location or tipping them off to a potential problem.

User feedback helps asses the trustworthiness of an address, adding or subtracting certainty that the site is where it says it is.

Inferred addresses are automatically given a lower statistical confidence score, and potential outdatedness of old addresses sees it marked down.

The project invites users to challenge its data, with one member of audience gleefully noting the absence of a handful of Nottingham back roads on Open Street Map.

Open Addresses uses four sources to reference their data, none of which actually cross reference:

  • Ordnance Survey’s Locator for road names
  • Ordnance Survey’s Strategi for settlements
  • Wikipedia for towns
  • Office of National Statistics for postcodes

What’s the point?

This big data big mission is all mighty impressive, but is there an actual purpose? Open Addresses hopes to create a data commons of all the places in the UK; it’s a platform for technology that has yet to arrive.

With each home given its own URL, The package-delivering drones will know exactly where to go, how to get there, and if there’s anything specific to be done.

But it’s not just a future-tech thing, it’s also a solution to problems right here right now. The organisation estimates that between 50 and 100 million pounds are spent every year on comprehensive address lists.

Not only that, but those lists aren’t even very good; they’re managed by old institutions using old processes.

And there are some people who’s lack of address data is messing with their rights. One man, the Open Addresses team claimed, couldn’t register to vote because his house wasn’t on record.

But perhaps this super-database will cause some problem as well – what about spam?

“Er, that probably won’t get any worse.”

And, indeed, they have a longer answer to that.

Data literacy: What is it and how to promote it?


This has been liveblogged. Sorry for mistakes, they’ll be fixed.

The concept of data literacy is touted as the be-all-and-end-all solution to all information issues, but it’s pretty loosely defined, and may not be entirely viable for the wider public.

The ODcamp Data Literacy discussion on Saturday afternoon was challenged to define the term, and figure out all that it entails.

Does that compute?

To attain the sort of data literacy that can decode huge sets – interrogate and interpret – you require, to an extent, mastery of both a subject and computing. That second skill, the computing one, is from where most of the problems emerge. It’s not feasible to computer-skill up everyone, but an understanding of how to use data is pretty important.

Sure, data can be better designed, made more useable for the uninitiated, but literacy really comes into play when the brick wall of bad data is hit. The combination of field and computing expertise enables you to articulate what is bad data, why it’s so bad, and figuring out how to circumvent that wall.

It’s about asking the right questions, the group agreed.

The english-plumbing divide

But the extent to which “problematic” computer skills are required depends on how critically you view the whole thing. Data skills were described alternately as equivalent to both:

  • learning the english language – an absolutely necessity
  • learning the trade of plumbing – useful but something you’re likely to outsource.

Perhaps one to file under politics > being more engaged with the world

How to teach it

  1. School
  2. Citizens Advice Bureau for data
  3. Open Data Cookbook
  4. School of Data running data journalism classes
  5. Relationships – finding mentors or other tutors

The Open Data Board Game

The gamification of big data is what Ellen Broud and the Open Data Institute are exploring with The Open Data Board Game Project.

Broud with her Australian brogue used her Saturday morning session at ODcamp to crowdsource ideas for what data could be used in the prospective board game, how, and, significantly, the wider benefits of using it.

The room was asked to feedback what types of open data would:

  1. help to establish some sort of utopia
  2. best fit the board game framework.

She used the example of energy efficiency, with prospective gamers using information to achieve greater savings. The game would highlight to non-data geeks how open data is a really important thing.

First, the group tried deciding what kind of game it would:

  • Old school (as in actually on an board)
  • Digital
  • Augmented reality.

It was mostly agreed that such a data-driven game would probably be more at home on a device, though they also stressed how they didn’t just want to make another Sim City.

The Complexity Crisis

Next question: complexity. The clever data types in the room have an expertise well beyond the gamers they’re pitching to. So how do you take that expertise and translate it? Do you try for a one-size fits all? Do you have different versions for different subjects?

Broud recalled her struggles learning to code using the hard-to-understand Ruby Warrior. The board game shouldn’t be like that. After some hemming and hawing, the game turned out like a  crazy version of Sim City, but that’s not really a board game.

But what about the central thesis that the wall of ‘data idea’ post-it-notes was supposed to provide?

One audience closed the session by saying:

It should show that no data is bad. And that you should feel bad.