Tag Archives: APIs

Open Data 101: Open Day for Newbies (2024 edition)

What is open data?

Data that’s not private and closed — it’s published in some way. It has to be accessible to the general public. It’s open to anyone.

One definition:

Open data is data that can be freely used, reused and redistributed by anyone, subject at most only to the requirement to attribute and share alike.

Open data licensing

It needs to be published under an Open Data Licence. You, if you’re able, publish it with that licence applied to it. Another common one in the UK is the Open Government Licence — the OGL. It is very free, and doesn’t require attribution. There are others, including Creative Commons.

The “free” when applied to open data is akin to the “free” in “free speech”. Someone may have paid to produce, publish and share the data — but usually, it can be used for no or marginal cost.

Commercial organisations can sell products built around the data — that’s not the same thing as selling the data. For example, they can make it much more searchable for the average user, or make connections between elements of the data. If they add value, they can charge for it — but it is a fine line.

Beware, though: some licences restrict or ban commercial use.

Getting open data

It can be as simple as a file you download from a website, like data.gov.uk. Sometimes it will be a static file in the CSV format — a non-proprietary format for data.

Some pre-digital data has been digitised and made example, but far from all. Digitising that data allows us to analyse it more easily — especially really large data sets — using software. However, occasionally, you will end up with more data than your hardware can handle. There’s a movement now to allow access to data through a tool or API, to get just the data you are looking for. APIs can be quite intimidating, though.

Some forms of data are more available than other — there is lots and lots of geospatial data available, for example. Of course, that’s in file formats more suited to geospatial work.

APIs

An API — Application Program Interface — is code on a data store, which allows you to request data from it programmatically using an agreed language. It provides the advantage of being able to offer data in real time — for example, weather data is better in real time. General election data is another recent example.

Open Street Map is an international community of volunteers who build a constantly updated open map at street level. It’s a very good, mature and robust set of data. The community take it very seriously — and it offered through an API.

Using Open Data

There are many tools you can use to work with open data. It very much depends on what you want to do:

  • find an answer to a question
  • build a product that allows people to ask questions
  • present it in a way that intrigues people.

Some tools

Coding languages:

  • Python is a coding language that’s great for working with data
  • R is another language. People from a statistic background tend to prefer R.

Learn one or the other, but not both at the same time.

Software:

  • There’s Power BI from Microsoft, which is very powerful.
  • Tableau you can use for free, and it does something very similar to Power BI.

At some point, if you want to make something useful for others, you’ll need to learn Javascript.

There are plenty of helpful tutorials on the internet to help you, as well as some useful books.

Community support

Sadly, the sorts of open data community leader roles that used to exist in local government are disappearing, because of the financial crisis in local government. Generally, it now falls within the remit of the GIS — geographical information systems — team, which are still funded because they’re so necessary.

The push for open data originally came from central government, back in 2010. But it’s not just about the government, but for any community who might benefit from making data available.

Book: Open Data for Everybody

A useful book by Nathan Coyle.


What makes for a good API?

One of the first questions to come up on day two of Open Data Camp was “what is an API?” One of the last issues to be discussed was “what makes a good API?”

 

Participants were asked for examples of application programming interfaces that they actually liked. The official postcode release site got a thumbs up: “It was really clear how to use it and what I’d get, and I can trust that the data will come back in the same way each time.”

Continue reading What makes for a good API?

Building an Open Addresses database – and opening its APIs

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

Gianfranco Cecconi & James Smith

Open Addresses are trying to build a huge addressing dataset from scratch, fighting the monsters and competitors that involves. They believe that addresses are a key asset of the national information infrastructure – and we need to liberate those addresses – or that was the pitch to the Cabinet Office.

The problem is huge.

They started with the assumption that they could build their dataset from existing open data sets, that (by chance) have associated address information, without intellectual property issues – and a volunteer workforce would then develop it from there. The Royal Mail suggests that there are 60m addresses in the UK – but that’s delivery places. This project has a wider view of the idea of addresses. Your electricity meter or your drone delivery spot might be an address.

Surviving as a non-profit

They also need to survive financially. They try to be frugal – so they try to not get sued, but they also try to build services that can fund what they do. The early money from the Cabinet Office will not last for ever. They have APIs that you can use in your products and services – for free. But there will be value added services on top of that. For example, “give me a likelihood of how real an address is”. It’s not a trivial problem – but could be very useful for delivery services.

There is no UK master list of addresses – no gold standard. Everyone is working to build their database, and all have errors, but some are further ahead. Confirmation is needed on these addresses, and Open Addresses is built to deal with this doubt and uncertainty as they go.

While they do need money to survive, many of their basic services are free, because they need to be there.

Working with the Open Addresses API

The obvious thing: search the data. And that you can do via the API. Just three lines! But the completeness is limited right now – they only have 1.2m of those 60m addresses. You can submit addresses through an API called Sorting Office. Again, free for now. They’ll normalise the address for you – and you can donate it to them, but you don’t have to.

With informed consent from your clients, you can hand over addresses to us on a day to day basis – through Turbot. It’s a platform for managing scrapers, and is descended from ScraperWiki. (It went live last night – 20th February 2015.)

Want to more sophisticated analysis on a block of text with addresses in it? The address building blocks API allow you to perform detailed analysis and processing on that sort of data. That is likely to be the main source of revenues in the battle to survive. The confidence API will be made available, giving a confidence score on any address.

Building the database

Their biggest challenge ahead of them is building the addresses. There’s a privacy issue – and persuading people that sharing addresses is not the same as sharing personal information about yourself doesn’t really tell anything personal. The existence of an address is not personal information, it’s just a fact. You can walk down streets and write them down. But it feels private.

There’s also a corporate approach, working with companies that use addresses, but they need explicit permission from clients to share their addresses.

Further notes and links.