A small but select band of Open Data Camp 5 participants gathered in the garden room for a final session devoted to the subject of catalogues. And meta-data. Or both.
Session leader Jen Williams explained: “I pitched a session on catalogues because there doesn’t seem to be much interest in them. The discussion [at #ODCamp] is all about datasets, and publishing datasets, and getting people to engage with them.
“It’s not about telling people what we have got. And I would say that publishing a catalogue goes a long way towards doing that.
“Also, I’d say that a catalogue is not just useful for external users. It can be useful for internal users; because you can’t use your own data if you don’t know what you’ve got.
It can also be used to generate interest. You don’t have to publish all the information in your catalogue immediately. You can say to people ‘what would you be interested in’ and then see what it would cost to publish that, and discuss the basis on which you might do it.”
Data about data
This links to meta-data: data about data. An ONS speaker said one of its challenges was getting information to people about the information it holds. “We don’t always do a great job of that, particularly if we have some of the data they want, or we need to explain how it can be used.
Yet: “There seem to be some key things that people need to know before they can decide whether they can use what we have or not. So what I want to know is what information do people need to see before they decide to go further.”
The session used post-it notes to gather information on what they would want to know. The first thing was date. Although Jen noted that it was important to say date of what: date collected, updated, modified. And that it can be useful to know what changed, as well as when something changed.
Next was update frequency, and when the next update can be expected. Then: a note on the data owner – perhaps with contact information for more details; method of capture; spacial information – or what area a dataset covers, with some indication of its granularity – and format for release.
Also, field descriptors. As one participant pointed out, many field names may be quite short: “It helps a lot to have a note on what those terms mean.” Also, a note on who holds the same or similar information, and how they use it. And information about how the data can be used.
How to publish?
The session could have gone on adding ideas for some time. But at some point, the catalogue has to be published. How should this be done?
Jen suggested: “You could just put a list, in text, on your website if you wanted to make all this information available. It might not be easy to maintain. But usefulness wise, getting a list, and starting a conversation with people, it’s great.
“You might have a dataset that was only collected once, ten years ago, and never updated. But somebody might want it. You can’t find that out if you don’t tell them about it.”
However, one participant pointed out that just getting a catalogue together was quite hard – for some big, government departments, he suggested, conducting an inventory of their datasets and just publishing them might be easier.
Jen agreed that lots of organisations have a data portal; which is a data catalogue for the data it releases. However, she re-iterated, there are benefits to having a full catalogue of information that it doesn’t release, as well as information that it does.
A journalist agreed: he said he had FoI’d departments for information they had released as open data; because nobody in the FoI department had been told this had been done.
Publish, and be damned?
The end of the session coincided with the end of the camp. So there was only time for a very quick debate about the technicalities of publishing.
There are lots of electronic formats in which this can be done. But Jen suggested the most important thing was to adopt the Nike approach – just do it. A text list is fine.
Greetings! I have just come across this site and blog.
One place to start is international standards. ISO 8000-110 deals with the exchange of characteristic data and states that the metadata must come from a data dictionary.
Full disclosure: I sit on the working group for this standard. I have recently penned a series of articles on this subject here: http://www.mroinsyte.com/insyte.