A session on rescuing usable data supplied in PDFs, led by Martin.
A client of one of the session participants needed an automated process to check which PDFs had changed data in them – and which didn’t. They had been doing it manually. However, a computational solution isn’t as easy as it looks. For example, software often finds it hard to spot a table. It’s relatively easy to extract data from a table in a PDF, if it looks clearly like a table – borders around “cells”. However, many tables in PDFs are clear to humans – but not to computers. Extracting those sorts of tables is much more tricky.
Continue reading Extracting Open Data from PDFs in usable formats
WARNING – liveblogging. Prone to error, inaccuracy and howling affronts to grammar and syntax. Posts will be improved over the next 48 hours
Google Docs notes for this session
(Alistair Rae introduced the ideas for this session in a blog post: How to map everything (but you definitely shouldn’t)
The age of open data has created an George Mallory approach to geo data: why did you create it? Because it’s there. Alistair has created loads of maps just because he could. But “why?” needs to be asked more.
Continue reading How to map everything – and how to share it