So… open data. What is it? How do you find it, use it, and get value from it? As ever, Open Data Camp opened with the session that reveals all.
Camp maker Katherine Rooney started with an even more basic question: what is data? Campers gathered at Geovation in London suggested it was an “information set” or “usable information that could be easily shared” while others suggested that to be data, information needed to be “structured” in order to be meaningful.
Katherine then moved onto the ‘open’ bit, and said “open data is data that is open to anyone” and “for any purpose.” However, campers heard, it does not have to be free; although, of course, open data enthusiasts want it to be available at the lowest cost and as easily as possible.
This raised the question of where open data comes from. Lots of people publish open data:
- government bodies
- public authorities
- private companies.
But how do you know that published data is open data? “What we are looking for is an open data licence,” Katherine explained. “And that licence lives with the data as metadata.”
Properly published and curated open data should also have other information attached: when and how it was created, who created and looks after it. Conversely, there is data that is not open data: with very few exceptions, personal data is never open data (as set out by the GDPR or General Data Protection Regulation).
Five stars: rating open data
Many of the sessions at Open Data Camp deal with what happens when open data is not curated as well as it might be: and also when it is published in formats that are not friendly to the people wanting to use it.
Camp maker Lucy Knight explained that the Open Data Institute has a five star system: “One star: here it is, it is open. It might be a pdf, a picture, or whatever. Some people don’t think that is open data, but if it has an open data licence then, for the ODI, it counts. Two stars: is machine readable: you can put it into a computer programme and it will read it.
“Three stars: it is machine readable and non-proprietary. We usually say that for most purposes that is good enough. Four stars: is when we start to get into linked data, which is not tabular or structured like a CSV or comma separated value file. Five stars is the shining peak.”
Some publishers also offer APIs or application programming interfaces to make it easier for developers (or their systems) to get their hands on data without having to query the underlying database. Without an API, there are ways of obtaining data, such as screen scraping: but one camper warned that “for a commercial project, that is a pretty dodgy way to proceed” and, of course, if a site publisher says the material is copyright and not available for screen scraping, then “that is a no-no.”
However, Katherine suggested that if information wasn’t clearly published as open data, it might be worth asking if it could be. The open data community often lobbies organisations to publish a data catalogue, so developers at least know what information they hold, and can ask for data that they would find useful to be published as open data.
The session then discussed a wide range of potential uses for open data: everything from public information sites to artworks. In all cases, it heard, the same basic principles applied: data is either covered by a commercial licence, in which case even a small extract or use will incur a charge, or it is open data. And the whole point of open data is that it can be used for any idea, useful or creative, or both.