Tag Archives: publication

Extracting Open Data from PDFs in usable formats

A session on rescuing usable data supplied in PDFs, led by Martin.

A client of one of the session participants needed an automated process to check which PDFs had changed data in them – and which didn’t. They had been doing it manually. However, a computational solution isn’t as easy as it looks. For example, software often finds it hard to spot a table. It’s relatively easy to extract data from a table in a PDF, if it looks clearly like a table – borders around “cells”. However, many tables in PDFs are clear to humans – but not to computers. Extracting those sorts of tables is much more tricky.

Continue reading Extracting Open Data from PDFs in usable formats

The why question: what is the purpose of open data?

Leader Nick Ananin, a project officer at Aberdeen City Council, explained that he had pitched the session because he was “confused”.

He explained that he was a system designer, and the first question a system designer always asks is: “What is the purpose of the system?” From that, he argued, it is possible to ask questions like: what products and functions will be needed to deliver that; and what controls will be put on them.

“So, in terms of open data, I started to think “how can we make sure that local authorities, when we publish data and add metadata, publishing the right data and adding the right meta data?” Get this wrong, he warned, and it would be impossible for potential users to find information, or for publishers to make sure it met their needs.

Continue reading The why question: what is the purpose of open data?