All posts by Adam Tinworth

Data Visualisation: making it work

An Open Data Camp 7 session on data visualisation, led by Ian Makgill. These are live-blogged notes.

Drawnalism: data visualisation

There is a lot of temptation to use really exciting visualisations. But 90% of the time, you end up with bar or line charts – because they work. If you have more than 20 data points along the x axis, you probably want a line chart, not a bar chart.

 

 

 

Continue reading Data Visualisation: making it work

Extracting Open Data from PDFs in usable formats

A session on rescuing usable data supplied in PDFs, led by Martin.

A client of one of the session participants needed an automated process to check which PDFs had changed data in them – and which didn’t. They had been doing it manually. However, a computational solution isn’t as easy as it looks. For example, software often finds it hard to spot a table. It’s relatively easy to extract data from a table in a PDF, if it looks clearly like a table – borders around “cells”. However, many tables in PDFs are clear to humans – but not to computers. Extracting those sorts of tables is much more tricky.

Continue reading Extracting Open Data from PDFs in usable formats

Registers: why they matter and how to save them

A Open Data Camp 7 session on registers, led by Andy Bennet of registers.app.

At the end of 2015, there was a project in the Government Digital Service about the structure of data. There was open.gov.uk, where the data was quite unstructured. The consumer had to wrangle it into the form they needed. In the legalisation, there were hundreds of thousands of mentions of registers – datasets that different departments and minsters needed to keep. The idea was to publish these registers of things government knows.

One core principle: these are owned and maintained registers. This makes them about governance – about making sure that there are people in positions of power with responsibility for them. You can’t spread the decision-making around – it has to be a named individual. There’s been some work done by the Open Data Institute in the last year about collaborative ownership models.

Continue reading Registers: why they matter and how to save them

Data Art: what are the limits and opportunities in data licensing for artists?

A session on using open data in artistic works of various sources, led by Leela Collins.

Traditionally, we have infographics, where we take data and visualise it so people can understand it. And then there’s conceptual art, which gains some of its meaning from the original data source. Does that create a new work, or does it owe something to the data producer?

Data is becoming a tool, in the same way that brushes are.

And then there’s protest art, where the whole of the data is used to create the art. But if the data is licensed non-commercially, can the artist make money from the work? A full open data licence is free for reuse. However, a non-commercial licence on some data is somewhat ambiguous – is it just restricting resale of the data itself, or does it prevent it being used for anything commercial?

Continue reading Data Art: what are the limits and opportunities in data licensing for artists?

Open Data Camp 7: Day Two pitches

The day has dawned bright and sunny on Open Data Camp 7’s final day. There’s a greta bunch of people present, the coffee is flowing, and it’s time to pitch. Here’s what’s on today’s menu:

Continue reading Open Data Camp 7: Day Two pitches

Building a data ecosystem in a low tech envronment

An Open Data Camp session on helping charities and other low tech bodies create data ecosystem stories improve their impact, led by Pauline Roche.

Liveblogging: prone to error, inaccuracy and howling crimes against grammar and syntax. Post will be updated in the coming days.

Over 80% of charities in this country operate on tiny budgets – often under £10,000 per annum. There are some similarities with, say, libraries, or arts bodies. There are resources out there for them – like 360giving – but they may not know about them, or have the confidence to use them.

Datakind offers a number of resources. They recently worked with the GLA, to help understand the number of refugees and migrants in London. There isn’t good data out there on that. But charities tend to know where they are – so could they provide that information. So they asked – and it would be fair to say that they weren’t keen on the idea. They said that, if they were going to do this, they needed support in working out what to collect, and how. And the GLA was willing to help take that on.

Many of the charities had no idea of the data already available that they could use, nor how data could help their own work. They paired up data experts with subject experts to figure out what was needed, and how to deliver that data.

Continue reading Building a data ecosystem in a low tech envronment

SPARQL 101: how to get started with the linked data search query language

How do you get started with SPARQL, the language for querying linked data? An Open Data Camp 7 session, led by Jen, aimed to help newbies get going.

Liveblogging: prone to error, inaccuracy, and howling crimes against grammar and syntax. Post will be updated in the coming days.

Learning about SPARQL and linked data

More and more open data platforms are either becoming linked data at their core, or they have offshoots that add it. The data underneath linked data is RDF – and SPARQL is the query language for RDF. Most SparQL endpoint look like a query box with gobbledegook with them – where you are expected to write your own gobbledegook. It’s somewhat intimidating,

In most cases, they also provide an API so you can programmatically query the information – but somebody needs to develop that. SPARQL endpoints give you direct access to all the data. The structure of RDF — the triples — creates a very standardised data format that you can query for whatever you like.

There’s a SPARQL playground where you can experiment with queries. There’s more than one of them, in fact.

You can use the query interface to hone down on the data you want, and then download it as a CSV, or use that as a query to use programatically. The playgrounds help you figure out how to construct queries by showing you the results on a sample dataset.

Continue reading SPARQL 101: how to get started with the linked data search query language

Dealing with Open Data excuses

An Open Data Camp 7 session on countering excuses for not publishing open data, led by Jenny Broker. Liveblogging: prone to error, inaccuracy and howling crimes against grammar and syntax. Post will be improved in the coming days.

Drawnlaism: a discussion on Open Data Excuses

Excuse: It’s a safety thing – it’s critical and it could be useful to terrorism

Safety is the first thing people will come after you with. For example, in utilities, it’s a very real concern, particularly around the location of assets. Is this a genuine concern, or an easy way of shutting down a conversation? Is this information that’s not already accessible via Google Maps, for example? Crashing critical infrastructure is a genuine risk. The most risky data is already heavily controlled — and is often not even shared within government. That comes with its own problems – issues get missed because staff don’t have access to the full picture.

So, if Google Maps has the data, if we make it more accessible, is there a potential for spotting problems earlier? Well, liability now raises its head. Pretty much all datasets are infested with personal data, so if you published the data, and something happens, you’re liable. Some people don’t want to take that risk. This is another standard way of hiding from open data. Some organisations have developed organised risk assessments for open data – it create a more structured way to talk about risk.

Continue reading Dealing with Open Data excuses

Open Data Strategy Campfire

A session from Open Data Camp 7 on delivering started nationally, led by Anne McCrossan. Liveblogged notes. Prone to error, inaccuracy and howling crimes against grammar and syntax. Post will be updated in the coming days.

Do we really have a national data started yet? Where we do have strategies – how well are they being implemented? Can we move things forwards by sharing experiences with each other?

Northern Ireland is on its second data strategy in six years. The first one was all data open by default – but they didn’t really have the delivery mechanisms or incentives to get civil servants to deliver. Hence the reason for a new one so quickly. It has a lot more reporting mechanisms in there, to exert pressure on local authorities, and report that upwards to general government.

Over the first three years, the success stories tended to be with startups and external companies. The frustrations were with the civil service.

Community Desires

What are our desired as a community, and how would they be expressed? Open data strategy in particular has tended to be less a strategy and more a commitment to getting it out. This is partially the result of the movement being kicked off by a coalition of interests. It can be challenging for some political ends.

Do we need outcomes? This is still an emergent space, so it’s hard to know what outcomes you might get. For example, going to the moon gave us Velcro, but it wasn’t part of the strategy… It’s very difficult to know what people will do with open data. So, maybe the strategy should just be delivery.

Continue reading Open Data Strategy Campfire

Wardley Maps and Open Data: a discussion

A session on Wardley Maps and value chains, led by Jonathan Kerr. Liveblogged notes – prone to error, inaccuracy and howling crimes against grammar and syntax. Post will be improved in the coming days.

What are Wardley Maps?

A map is a thing that shows a space – where things are in relationship to each other, as well as the overall concept. All models are, by nature, simplifications. You could make an accurate map of France, but it would be unwieldy…

You can map a blue chain by mapping all the components into an end result – but you don’t need to map every single element, just the ones that are important to you.

As you move along the value chain, you increase repeatability:

  • Genesis – the original concept. (R&D)
  • Custom – you start making it (bespoke elements)
  • Product – you start making machines to make the thing. It become replicable
  • Utility – available as a service. Think of APIs, charged on per use basis.

This isn’t a linear process – things move in and out of categories. By mapping the parts of the value chain through this grid, you can spot where you have elements of your process which are costing you disproportionate amounts of money.

Continue reading Wardley Maps and Open Data: a discussion