What is open data?
Data that’s not private and closed — it’s published in some way. It has to be accessible to the general public. It’s open to anyone.
One definition:
Open data is data that can be freely used, reused and redistributed by anyone, subject at most only to the requirement to attribute and share alike.
Open data licensing
It needs to be published under an Open Data Licence. You, if you’re able, publish it with that licence applied to it. Another common one in the UK is the Open Government Licence — the OGL. It is very free, and doesn’t require attribution. There are others, including Creative Commons.
The “free” when applied to open data is akin to the “free” in “free speech”. Someone may have paid to produce, publish and share the data — but usually, it can be used for no or marginal cost.
Commercial organisations can sell products built around the data — that’s not the same thing as selling the data. For example, they can make it much more searchable for the average user, or make connections between elements of the data. If they add value, they can charge for it — but it is a fine line.
Beware, though: some licences restrict or ban commercial use.
Getting open data
It can be as simple as a file you download from a website, like data.gov.uk. Sometimes it will be a static file in the CSV format — a non-proprietary format for data.
Some pre-digital data has been digitised and made example, but far from all. Digitising that data allows us to analyse it more easily — especially really large data sets — using software. However, occasionally, you will end up with more data than your hardware can handle. There’s a movement now to allow access to data through a tool or API, to get just the data you are looking for. APIs can be quite intimidating, though.
Some forms of data are more available than other — there is lots and lots of geospatial data available, for example. Of course, that’s in file formats more suited to geospatial work.
APIs
An API — Application Program Interface — is code on a data store, which allows you to request data from it programmatically using an agreed language. It provides the advantage of being able to offer data in real time — for example, weather data is better in real time. General election data is another recent example.
Open Street Map is an international community of volunteers who build a constantly updated open map at street level. It’s a very good, mature and robust set of data. The community take it very seriously — and it offered through an API.
Using Open Data
There are many tools you can use to work with open data. It very much depends on what you want to do:
- find an answer to a question
- build a product that allows people to ask questions
- present it in a way that intrigues people.
Some tools
Coding languages:
- Python is a coding language that’s great for working with data
- R is another language. People from a statistic background tend to prefer R.
Learn one or the other, but not both at the same time.
Software:
- There’s Power BI from Microsoft, which is very powerful.
- Tableau you can use for free, and it does something very similar to Power BI.
At some point, if you want to make something useful for others, you’ll need to learn Javascript.
There are plenty of helpful tutorials on the internet to help you, as well as some useful books.
Community support
Sadly, the sorts of open data community leader roles that used to exist in local government are disappearing, because of the financial crisis in local government. Generally, it now falls within the remit of the GIS — geographical information systems — team, which are still funded because they’re so necessary.
The push for open data originally came from central government, back in 2010. But it’s not just about the government, but for any community who might benefit from making data available.
Book: Open Data for Everybody
A useful book by Nathan Coyle.