Monthly Archives: February 2015

Open Data Camp: The Good, The Bad and the Surreal

 

Liveblogged notes from the feedback session closing Open Data Camp 2015

Acts of ?Deity

  • Rain! Bad rain!

Logistics

  • Too many t-shirts
  • Too much food
  • Maybe stuff with a longer life to make it easier to donate
  • No milk for tea on Sunday
  • Bit more variety in veggie stuff
  • Vegan?
  • Cake was good!

Data

  • More information needed on who attendees are
  • Maybe split registration so we know how many people are attending each day
  • Capture geographical data
  • Capture Twitter handles and site addresses
  • Some people not happy to share that with commercial sites
  • Wordle of attendees’ RSS feeds?

Connectivity

  • WiFi connectivity is bit dodgy (but better than most conference)
  • VPNs are a problems.
  • SSID missing from badge

Social

  • Pub was a bit far away
  • No afterparty

Timing and organisation

  • Clarity on time – the change to the starting time wasn’t widely shared
  • Don’t assume newbies will understand how an unconference will work.Give them more details.
  • Tell people what times the sessions are at the beginning
  • Have the key organisers (or at least some who are free to aid people) introduce themselves, so you know who to go to for help.
  • One colour of organiser t-shirt, not two. Star Trek demands that it’s red…
  • Maybe the decision to coincide with Open Data Day was a mistake
  • We lost interesting people to local events
  • Ride share system? Cars – or travelling together by train
  • The Google Groups were difficult to use and hard to find
  • Hotel rate negotiated?

Session structure

  • 25 min to 30 min slots than can be put together for longer sessions
  • Lightning talks? 5 to 10 minute talks. Brings variety to the day
  • They would need gaps, a breakout area and Twitter handles on screen

Activities

  • The Cabinet Office released data sets for us – and they weren’t used
  • That means we don’t really have anything to show now
  • More diversity of datasets – museums? art galleries?

Marketing

  • We relied on Twitter and blogs
  • Sold out quickly – could we have used other channels?
  • Danger of getting stuck in one bubble of data geeks
  • T-shirts maybe not working?
  • Get sizes in advance
  • Stickers? NOT mugs
  • Publish data about the event on the site – a digital souvenir
  • Snoods? Hats?
  • Water bottle is a popular idea – would reduce waste
  • Room keen on no physical goods and digital data store
  • Lunchboxes for the food with stickers on?

Sponsors

  • Read them out
  • As URIs on the site?
  • Are we tracking why they sponsor us?
  • Sponsor talking slots – 5 mins.
  • Not all sponsors want more than the event to happen

Diversity

  • Gender diversity an issue on Sunday
  • More designers at the event? We need more data users.

See you next year?

Handy tools for Open Data users

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

 

A session sharing handy open data tools that participants have built or found that might just make your life easier.

Google document for this session

Chris Gutteridge, University of Southampton

  • Prefix.cc – look up namespaces for RDF
  • Graphite PHP Linked Data Library – most of the RDF tools are written by academics who are clever, and assume that others are clever. Chris just wanted to build something easy – and that’s what Graphite is. It’s an easy way of exploring linked data. It makes it easy to debug the RDF code you create. The development version has a SPARQL interface, making it easy to build SPARQL queries.
  • Alicorn – a tool for generating pages from linked data.
  • RDF Browser – a quick and dirty RDF browser
  • Triple Checker – a tool to check for common errors in RDF Triples.
  • NTCat
  • Hedgehog – an RDF publishing platform

All of the source code for these is available on GitHub.

James Smith, ODI

The ODI tends to focus on simpler tools – and formats like CSV. Some much data out there is in poor condition.

  • CSVlint – a validator for data in CSV format, which also works with schemas. In alpha currently, and aiming for beta this year.
  • Open Data Certificates – a project to help people make assurances around their data, that gives others the confidence to build from it.
  • Git Data Publisher – a tool to help you publish your open data, guiding you through what you need to do.

Others

  • Gangplank – an open source data publishing platform

Crowdsourcing the perfect FOI request

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

Gaia Marcus has been working on a dashboard to show the scale of youth homelessness in the UK. It’s not necessarily just street homelessness – that’s fairly unusual in this country. There are plenty of other, more common, kinds of homelessness.

There are a whole range of problems with getting data via Freedom of Information, including tine restrictions, format issues and the dreaded PDF response. To counter that, they’re building a form for an FoI request that seeks to shape the request in a way that deals with those problems.

This is part of a campaign to get bodies to report this data better. They tend to both store and share it in a deeply static format right now – we need to get it in a more open, useful format.

The Human Factor

The discussion focused around dealing with non-expert FOI recipients, who need as much help as possible to produce a quick, useful response. Here’s some key points from the discussion:

  • How deeply should we ask for specific formats? Yes, we should probably ask them to return it in the spreadsheet we sent.
  • Excel versus CSV? Some preference for CSV, but there are good reasons for going Excel – familiarity is one, for example. Google Docs is out due to restricted access. Maybe Excel for those who can’t and CSV for those who can…
  • In extremis you could use SurveyMonkey or Google Docs to allow people to fill in the data directly for you. It does introduce a risk of human corruption of data – but that’s the risk at every stage humans are involved.
  • You should also specify that it should be published as open data on the website – and that saves you the cost of future FOI requests. There’s allegedly some research from Leeds City Council that their FOIs went down since they’ve started publishing Open Data. No-one here’s seen it, though.
  • In case of refusal, is capturing the reason why they can’t fulfil the request useful? The consensus seems to be “yes”.
  • We need to confirm the licence of the data – and ideally it should be Open Government Licence (and you’d need to link to an explanation of that). That way you could publish the data yourself, which you can use as part of a cost argument (fewer FOIs, because we publish this as open data for you).
  • Reference similar requests and highlight why what you’re asking for is significantly different.
  • Beware being classified vexatious by overwhelming authorities with requests.

Here’s the working document used to capture the FOI session input

Linking University Data – the open way

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

What should the university-centric data.ac.uk site be? It’s still a matter of debate, but in the meantime, Chris Gutteridge from the University of Southampton has it up and running while that’s resolved.

The starting point was a list of Universities – one more loved that the one on the UK government site. He strongly believes that every time you publish data, you should create something that ordinary people can use, not just the files.

The hub is really, really noddy – but it is a hub, and others can link to it. And that enables linked data around universities. They’ve been funded to the tune of £250,000 over two years. So what did they do with the money?

Open university data – so far

Equipment resourse

They built equipment.data.ac.uk – and they insisted that there was a contact link for whom you should tell if the data was wrong. They’re getting better at finding the equipment from the webpage – so they’re insisting that after the discovery phase, the equipment data should be auto discoverable. The bronze/silver/gold ranking helps motivate authorities.

They scan every ac.uk homepage once a week. If you’re not part of this at the moment – you can just add the data, and they’ll find it

University Web Observatory

They’ve built a web observatory, analysing how ac.uk domains use the web.

Searchable university news feed

They’re scraping the RSS feeds of the sites, too, to create a combined, searchable news feed of University information.

Purchasing codes

CPV Codes – could be incredibly useful for university purchasing information.

What next?

They have 2/3rds of the Russell Group involved – not because they believe in Open Data, but because they want their equipment advertised, and this is the easiest way for them to do it. But it acts as a trojan horse for the idea.

Next? Maybe university experts for media appearances. Hospitals ward data? Auto-discovery of that from hospital homepages would replace the idea of a central source. In fact. all of these distributed efforts mean that you replace dependence on a central broker whose funding – or interest – may wane.

Lincoln has developed an idea of a semantic site map, by marking up types of pages, called Linking-You.

“You can’t force people to use standards. you want them to embrace them because they’re better”

Open Addresses: Multiplying addresses

The team behind Open Addresses addressed Open Data Camp for the second time on Saturday in the day’s final session.

The ongoing-development of an enormous database of every address the UK, designed to be integrated into the National Information Infrastructure, began by introducing a couple of tricks they build upon their address-data skeleton.

They are using existing datasets not otherwise used for this purpose, such as those provided by Companies House. These sets can be extrapolated, with further addresses hinted in amongst the data.

Ze remote problem

The ‘Ze’ postcode in Scotland’s Shetland Islands has called the “black sheep” dataset since its publication in December as it only offers 117 addresses. It’s an example of how sparse house data can be, and demonstrates how particularly difficult it is to cover remote areas; big cities like London have more readily available address information.

On Fogralea road in ZE1, for instance, there are only two houses: number 5 and number 30.

Open Addresses has figured out a way of make an informed guess how the rest of the street looks.

They call it ‘inferring’ and it helps place houses 6 through 29.

Above or below those numbers, however, nothing can be done because the risk of getting the postcode wrong is too great; it’s better not have an address than to send someone to the wrong place.

How can you help?

This inferring process is supported by the organisation’s crowdsourcing efforts. The Open Address mobile app asks users to “help us improve the UK’s address data” by either confirming an address location or tipping them off to a potential problem.

User feedback helps asses the trustworthiness of an address, adding or subtracting certainty that the site is where it says it is.

Inferred addresses are automatically given a lower statistical confidence score, and potential outdatedness of old addresses sees it marked down.

The project invites users to challenge its data, with one member of audience gleefully noting the absence of a handful of Nottingham back roads on Open Street Map.

Open Addresses uses four sources to reference their data, none of which actually cross reference:

  • Ordnance Survey’s Locator for road names
  • Ordnance Survey’s Strategi for settlements
  • Wikipedia for towns
  • Office of National Statistics for postcodes

What’s the point?

This big data big mission is all mighty impressive, but is there an actual purpose? Open Addresses hopes to create a data commons of all the places in the UK; it’s a platform for technology that has yet to arrive.

With each home given its own URL, The package-delivering drones will know exactly where to go, how to get there, and if there’s anything specific to be done.

But it’s not just a future-tech thing, it’s also a solution to problems right here right now. The organisation estimates that between 50 and 100 million pounds are spent every year on comprehensive address lists.

Not only that, but those lists aren’t even very good; they’re managed by old institutions using old processes.

And there are some people who’s lack of address data is messing with their rights. One man, the Open Addresses team claimed, couldn’t register to vote because his house wasn’t on record.

But perhaps this super-database will cause some problem as well – what about spam?

“Er, that probably won’t get any worse.”

And, indeed, they have a longer answer to that.

Data literacy: What is it and how to promote it?

 

This has been liveblogged. Sorry for mistakes, they’ll be fixed.

The concept of data literacy is touted as the be-all-and-end-all solution to all information issues, but it’s pretty loosely defined, and may not be entirely viable for the wider public.

The ODcamp Data Literacy discussion on Saturday afternoon was challenged to define the term, and figure out all that it entails.

Does that compute?

To attain the sort of data literacy that can decode huge sets – interrogate and interpret – you require, to an extent, mastery of both a subject and computing. That second skill, the computing one, is from where most of the problems emerge. It’s not feasible to computer-skill up everyone, but an understanding of how to use data is pretty important.

Sure, data can be better designed, made more useable for the uninitiated, but literacy really comes into play when the brick wall of bad data is hit. The combination of field and computing expertise enables you to articulate what is bad data, why it’s so bad, and figuring out how to circumvent that wall.

It’s about asking the right questions, the group agreed.

The english-plumbing divide

But the extent to which “problematic” computer skills are required depends on how critically you view the whole thing. Data skills were described alternately as equivalent to both:

  • learning the english language – an absolutely necessity
  • learning the trade of plumbing – useful but something you’re likely to outsource.

Perhaps one to file under politics > being more engaged with the world

How to teach it

  1. School
  2. Citizens Advice Bureau for data
  3. Open Data Cookbook
  4. School of Data running data journalism classes
  5. Relationships – finding mentors or other tutors

The Open Data Board Game

The gamification of big data is what Ellen Broud and the Open Data Institute are exploring with The Open Data Board Game Project.

Broud with her Australian brogue used her Saturday morning session at ODcamp to crowdsource ideas for what data could be used in the prospective board game, how, and, significantly, the wider benefits of using it.

The room was asked to feedback what types of open data would:

  1. help to establish some sort of utopia
  2. best fit the board game framework.

She used the example of energy efficiency, with prospective gamers using information to achieve greater savings. The game would highlight to non-data geeks how open data is a really important thing.

First, the group tried deciding what kind of game it would:

  • Old school (as in actually on an board)
  • Digital
  • Augmented reality.

It was mostly agreed that such a data-driven game would probably be more at home on a device, though they also stressed how they didn’t just want to make another Sim City.

The Complexity Crisis

Next question: complexity. The clever data types in the room have an expertise well beyond the gamers they’re pitching to. So how do you take that expertise and translate it? Do you try for a one-size fits all? Do you have different versions for different subjects?

Broud recalled her struggles learning to code using the hard-to-understand Ruby Warrior. The board game shouldn’t be like that. After some hemming and hawing, the game turned out like a  crazy version of Sim City, but that’s not really a board game.

But what about the central thesis that the wall of ‘data idea’ post-it-notes was supposed to provide?

One audience closed the session by saying:

It should show that no data is bad. And that you should feel bad.

11 Horror Stories of Open Data

A cathartic session of data ranting, where Open Data Camp attendees shared their data horrors under the Chatham House rule:

Horror: A PDF full of screenshots

Looking for the location of fire hydrants? If you make FOI requests, you’ll be told they’re national security, or private data or… One council did send the info – but as a PDF. And in the PDF? Screenshots of Excel spreadsheets.

Lesson: Ask for a particular format…

Horror: Paved with good intentions

A government ministry was asked for its spending data, but had to be fought all the way to the Information Commissioner, because they argued that they had intended to publish, and that was enough to give them leeway not to publish. he Information Commissioner disagreed.

Lesson: Just saying “intent” does not let them off the hook

Horror: Customer Disservice

An angry Twitter user asking about his broadband speed was sent a huge dataset of broadband speeds by postcode, as a zipped CSV. And was a bit cross when he realised he couldn’t use it. So a member of the organisation helped out by creating way of reading it – and got told off by his manager for helping the public.

Lesson: No good deed goes unpunished.

Horror: The art of the inobvious search

Googling a list of GP locations, they found an NHS search service – no place to download it. ONS? 2006 data. It took her getting angry, walking away from the computer, and coming back and making a ridiculous search to find it. If you aren’t make it accessible, why bother?

Lesson: Just creating data isn’t enough.

Continue reading 11 Horror Stories of Open Data

Design and Data: how do we bring them together?

 

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

Session hosted by Simon Gough, ODI Devon

Global Service Jam is a two day event exploring using service design to build human-centred tools. It’s not something data is mentioned much at. On the other hand, hackathons are often all about data. The two don’t seem to meet.

He’s involved with #dataloop – a tool for exploring data as a designer.

  • Infographics are an interesting one. They’re very much about presenting data in an easily digestible format. What is the design process behind the infographic that explores reader need? Scale of audience? Understanding the data – and what you want it to do.
  • Visualisation tools – there are plenty of them, but how much traction they get? Service design methods tend to be qualitative, rather than quantitative.
  • Personas are a core point of most service design. Can you consider that persona’s need for data? Of course, and so that should be part of the process. If you’re not used to working with data, though, that can be a fairly abstract process.

Are there other parts of this process that we can bring data into?

Developers can be more comfortable with design (because they use it), while designers tend to be uncomfortable with data. That said, you can see resistance to designers in hack days.

Can we make this process easier by working within a very specific challenge that helps create the right focus by reducing the complexity?

Psychology versus process

One software developer suggests we’re dealing with a conflict between a psychological approach (design) and process approach (software). Two different dynamics – so can there be a single solution? Well, this is where the defined challenge comes back into play. In more open situations – like hack days and the service jam – you have this problem. Without a context, how can designers explore data?

Open source is a common point: software for data, frameworks (personas, journey maps, blueprints) for designers. Both professions use forms of frameworks to shape their work, but they’re not really aware of each other’s tools.

This is not a tools or methodology problem, suggest one attendee, but a cultural one. And another suggests that is where a project manager comes into play, and can be vital to bridging this divide. You need to know a reasonable amount about how someone else works to co-operate with them well.

Specialism or democratisation?

We’ve seen journalists emerging as data journalists – but it’s a core group of specialists right now. Will we see data specialist designers? Or is the increasing complexity of data and data formats making it hard for that sort of specialism to emerge?

Design is moving in the opposite direction – democratisation of design through co-creation, for example.

We have exponentially larger amounts of data available, with a geometric rise in connections. That again creates more opportunities, but again makes things harder.

Maybe agile approaches – working on cross-functional teams on constrained problems on sprints – might be one approach. Devops was a cultural change, that lead to a whole bunch of new tools to facilitate that way of working.

Tools can ease the learning curves of taking a designer mindset and applying it to data work, without shackling it to the designer’s initial way of thinking, as early web development software did.

Further notes and links.

Food, hygiene and the open data challenge

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

ODCamp 21-02-15_14_Food_Standards_Agen

Hosted by Dr Sian Thomas, Food Standards Agency

The Food Standards Agency has a big commitment to open data – but is honest that it’s not always in a useful format. Dr Thomas asked for suggestions for improving that, and the room had plenty of ideas…

The more ways of accessing the data, the better was the message: RSS, CSVs, APIs, etc. Tab separated data is “old fashioned” – but pretty easy to deal with. However, she’s only got a team of four, and is responsible for a lot more than open data (like date protection, FoI, and so on…). They’re dependent on other data-collecting organisations opening up what they do.

Supply chain open data could be a really interesting perspective, especially for the rural part of the economy. DEFRA has a lot of open data on that. But once it enters the supply chain it becomes commercial data, and no-one releases that. Some supermarkets release some data, but far from all, and in theory you can do more down the packaging chain. By law you need to know one step above and one step below – who you bought it from and who you sell it to. It’s not a standard format, though. Also, food is traded as a commodity, so it often changes had without physically moving. That said, DEFRA is right at the top of the list of bodies that release data.

Data quality: how do authorities describe supermarket canteens? As the company it’s in – or the contract catering company actually running it. There is a standards quality programme – but there are cultural factors that come into play. For example, in a more affluent area the forms of food consumed might be inherently more risky – rare meat and chicken liver paté. They notice the quality issues most in Wales, where there’s mandatory scores on the doors of the rating, and that’s changed things.

The App Gap

There are lots of apps around some of this data – but they never seem to get past competition wins into existence, or at least into consumers’ hands. Maybe they should approach people like Yelp and TripAdvisor? It’s been mooted before. There’s strong correlation between their scores and food hygiene ratings. Maybe they could be used as a trigger for reinspection?

Could food hygiene data enrich open street maps? Sure. Pub data to highlight pubs they don’t have marked right now, or warning signs for dodgy takeaways. But address data is a problem – what do you do about hospital sites, with multiple outlets on a postcode, or a great restaurant next to that dodgy takeaway.

Updates are a problem too – we’re only getting an annual snapshot of more rapidly updated date. Could we get an RSS feed of changes, for example? Parsing the existing XML can be tricky. In Belfast people use backslashes in range addresses that breaks a lot of operations.

Accounting for allergies

Food contamination alerts for allergies need more work. They’d really like to take the RSS feed of allergy updates, and make them filterable by specific allergy, but they’re not allowed to invest in that kind of service. Could you relate that to barcode scanning? Yup, in theory. That would allow some apps to check for the update.

Allergies are a complex area – we have undiagnosed people, we have inaccurately self-diagnosed people, and not comprehensive picture of what foods are creating the biggest issues. There are some files available on the Food & You section of gov.uk, and generally decent figures on the diagnosed people.

Food poisoning outbreaks are hard to pinpoint quickly – unless it can be identified via social media. For example, an outbreak via a curry festival was identified by social media before the labs managed to do so.