A session on rescuing usable data supplied in PDFs, led by Martin.
A client of one of the session participants needed an automated process to check which PDFs had changed data in them – and which didn’t. They had been doing it manually. However, a computational solution isn’t as easy as it looks. For example, software often finds it hard to spot a table. It’s relatively easy to extract data from a table in a PDF, if it looks clearly like a table – borders around “cells”. However, many tables in PDFs are clear to humans – but not to computers. Extracting those sorts of tables is much more tricky.
Continue reading Extracting Open Data from PDFs in usable formats
Leader Nick Ananin, a project officer at Aberdeen City Council, explained that he had pitched the session because he was “confused”.
He explained that he was a system designer, and the first question a system designer always asks is: “What is the purpose of the system?” From that, he argued, it is possible to ask questions like: what products and functions will be needed to deliver that; and what controls will be put on them.
“So, in terms of open data, I started to think “how can we make sure that local authorities, when we publish data and add metadata, publishing the right data and adding the right meta data?” Get this wrong, he warned, and it would be impossible for potential users to find information, or for publishers to make sure it met their needs.
Continue reading The why question: what is the purpose of open data?
A small but select band of Open Data Camp 5 participants gathered in the garden room for a final session devoted to the subject of catalogues. And meta-data. Or both.
Session leader Jen Williams explained: “I pitched a session on catalogues because there doesn’t seem to be much interest in them. The discussion [at #ODCamp] is all about datasets, and publishing datasets, and getting people to engage with them.
“It’s not about telling people what we have got. And I would say that publishing a catalogue goes a long way towards doing that.
Continue reading Catalogues and metadata