Open Data for Newbies (2017 edition)

It’s OK to accept that bright, engaged people might not know what Open Data is. So, here’s a beginner’s guide for them, liveblogged at Open Data Camp 5 in Belfast.

 

What is open data?

It’s a data set that anyone can access, and which has a licence, and which has been published. Primarily, it’s in a machine-readable format.

Data, in this context, is anything! A photo can be open data. Generally, we’re talking data that can be presented in rows and columns in a CSV (comma separated variables) file. It’s an open format akin to (and compatible) with Excel, but which isn’t dependent on owning Microsoft Office.

Data is the first stage – it’s just data, not yet information.

Where is it published?

On a website – in a way that you can easily access, ideally without limitation or need to register. You shouldn’t have to pay.

There are various platforms and portals that make open data available.

What about the licences?

It should be published with an open data licence attached. The licence tells you what you can do with it, and under what conditions (like attribution, for example). The Open Government Licence (OGL) is one example. Creative Commons is another one.

Data without a licence isn’t really open, because you don’t know how you can use it.

Technically,if you abuse the licence, you can be cut off from using it – but that’s hard to enforce.

Do you need an ethics licence?

Open data should never be personal.

There’s a data spectrum between closed, private data, via shared data (which is available to a subset of people), and then there’s public data (like Twitter’s feed, for example), and finally open data.

Open Data is data that is free for anyone to access or share. Even if it is derived from personally-identifiable data, that data should be anonymised.

What is metadata?

Metadata is data about data. It’s information like the source of the data, or how it was collected. Sometimes the metadata is great, but sometimes it doesn’t exist. Metadata is where you can give your data context.

What is the ODI?

The Open Data Institute was founded five years ago as a charity to connect and inspire people around the world to use open data.

ODI Nodes are local groups of open data enthusiasts and advocates. They’re a bit like a franchise. There’s no trickle down funding, so the nodes have to raise their own funds and use volunteers.

What is an API?

An API is an application programming interface. It allows you to automate extraction of data from a data source, via coding. Basically, someone who owns data on a server has written some code that allows you to access that data. APIs are more interesting for realtime data – which is constantly changing. TfL is publishing loads of realtime transport data about London via APIs. The CityMapper app uses that API.

In Bristol there are air quality monitors that report every 24 hours via an API.

It’s a way of automating updated data access.

What is Linked Data?

Linked data is a data point that a computer can read, that allows referencing of data. So, if you’re citing data in a paper, you can provide a hyperlink to the original data so people can check the provenance.

What are the five stars?

These were determined by Sir Tim Berners-Lee

  1. Make it open
  2. Make it machine readable (tabular data in a spreadsheet, for example)
  3. Same as above, but in a non-proprietary format.
  4. Using an URI – uniform resource identifier
  5. Linking your data to other data.

More info on 5 star open data. Generally speaking, three star is good enough.

What are registers?

These are definitive data sets. The Government Digital Service are building these for some key pieces of information such as the definitive list of countries in the world.

Do we have data standards?

A standard is everyone agreeing to do something in the same way. We don’t have a definitive list of standards for open data. They make things much easier – but are hard to agree and enforce. Standards make it much easier for machines tho read data and connect different data sets. Humans make this worse by having preferences. There are a growing body of code snippets that allow data to be transformed into a preferred format, if it wasn’t supplied at that way to start.

There are standards bodies which think very hard about standards, agree them and publicise them. W3C, ISO and so on. It’s hard to enforce, but you can persuade.