Category Archives: ODcamp 2015

Swirrl: We’re delighted to support ODCamp again

Open Data Camp v2.0 is coming to Manchester in October and we’re happy to say we’ll be sponsoring it once more!  Just like the first Open Data Camp, it’s devoted to all things Open Data over two days and is a great style of event; with an unconference set up and lots of enthusiastic people who really know their stuff on a range of topics.

We’re delighted to support ODCamp because if the Winchester event in February was anything to go by it will be great fun and very interesting, but also because there will be lots of current or potential users of our PublishMyData open data publishing platform there.

It’s a great opportunity for us to hear about what kind of problems people are trying to solve with data; what kind of datasets they are trying to connect together and what approaches they are taking to analysing the data to help them with their decision making.

We’ll have a chance to show off some of the ways we can already help with that, and go home with a bucketload of ideas for how to make PublishMyData better still.

And it’s being held 10 minutes walk from our office – so if anyone needs tips on where to find good beer or curry, let us know!

odcpost-scottish-stats-beta-resized
List of data themes for the Scottish Government

Deprivation Mapper for Open Data Communities
Deprivation Mapper for Open Data Communities

Area Profiles on the Hampshire Hub
Area Profiles on the Hampshire Hub

 

 

Swirrl-logo

 

Open Data Camp 2 – It’s on like Donkey Kong

We, The Organisers of Open Data Camp, do hereby announce that we have started to think about ODCamp 2. The success of the first ODCamp meant that it would be wrong to not have another one.

We have had a couple of hangouts, and the main topic of conversation so far has revolved around what to call the next Open Data Camp. Pole position so far is ODCamp 2 -The Wrath of Maude. We are open to suggestions.

Probably more importantly though, is where and when to hold ODCamp2. In Winchester, we all agreed that we would like to move ODCamp around the country and that the North would be a good next destination. In terms of a rough date, we had it pointed out that holding ODCamp on International Open Data Day was actually a bad move, as it detracted from local ODD events. We also know that we need to keep the momentum that we’ve started to build. With that in mind, we are proposing that ODCamp 2 takes place somewhere in the North, some time in Autumn. We are also suggesting that ODCamp slowly makes its way up North, and stops off in the West Midlands, at Blue Light Camp on the 6th and 7th June 2015, for a bit of open data/emergency services mashup action.

Unsurprisingly, given ODCamp’s success, we have already had enquiries from organisations who want to host ODCamp2. In the spirit of openness and transparency, we would now like to invite expressions of interest to host ODCamp2 (not like a big formal tender process – just trying to make sure we do our best to get it right)

We have come up with a few points that any expression of interest will need to address, to try and ensure we have another successful camp:

  • A venue with a big room that can hold up to 200 people (not necessarily seated)
  • the venue should also have at least 4 breakout rooms
  • Transport to and from the venue should be good
  • Accommodation should be plentiful, with the possibility of securing deals for attendees
  • Entertainment nearby
  • Excellent WiFi capabilities
  • A local person able to work with the organisers
  • Free/cheap and available in Autumn
  • Catering (tea / coffee / sandwiches / cake)

If you know of someone or somewhere in the North who has an appetite for hosting the next full ODCamp – maybe write a blog piece expressing interest and demonstrating how the above points will be met, and link to it in the comments below here. Or simply put the details straight into the comments. The more info the better. We will then review all the suggestions, in the week of the 13th April, and then make some sort of announcement.

Also, any suggestions for a name gratefully received – in the comments, please…

Ta

The Organisers

Open Data Camp: The Good, The Bad and the Surreal

 

Liveblogged notes from the feedback session closing Open Data Camp 2015

Acts of ?Deity

  • Rain! Bad rain!

Logistics

  • Too many t-shirts
  • Too much food
  • Maybe stuff with a longer life to make it easier to donate
  • No milk for tea on Sunday
  • Bit more variety in veggie stuff
  • Vegan?
  • Cake was good!

Data

  • More information needed on who attendees are
  • Maybe split registration so we know how many people are attending each day
  • Capture geographical data
  • Capture Twitter handles and site addresses
  • Some people not happy to share that with commercial sites
  • Wordle of attendees’ RSS feeds?

Connectivity

  • WiFi connectivity is bit dodgy (but better than most conference)
  • VPNs are a problems.
  • SSID missing from badge

Social

  • Pub was a bit far away
  • No afterparty

Timing and organisation

  • Clarity on time – the change to the starting time wasn’t widely shared
  • Don’t assume newbies will understand how an unconference will work.Give them more details.
  • Tell people what times the sessions are at the beginning
  • Have the key organisers (or at least some who are free to aid people) introduce themselves, so you know who to go to for help.
  • One colour of organiser t-shirt, not two. Star Trek demands that it’s red…
  • Maybe the decision to coincide with Open Data Day was a mistake
  • We lost interesting people to local events
  • Ride share system? Cars – or travelling together by train
  • The Google Groups were difficult to use and hard to find
  • Hotel rate negotiated?

Session structure

  • 25 min to 30 min slots than can be put together for longer sessions
  • Lightning talks? 5 to 10 minute talks. Brings variety to the day
  • They would need gaps, a breakout area and Twitter handles on screen

Activities

  • The Cabinet Office released data sets for us – and they weren’t used
  • That means we don’t really have anything to show now
  • More diversity of datasets – museums? art galleries?

Marketing

  • We relied on Twitter and blogs
  • Sold out quickly – could we have used other channels?
  • Danger of getting stuck in one bubble of data geeks
  • T-shirts maybe not working?
  • Get sizes in advance
  • Stickers? NOT mugs
  • Publish data about the event on the site – a digital souvenir
  • Snoods? Hats?
  • Water bottle is a popular idea – would reduce waste
  • Room keen on no physical goods and digital data store
  • Lunchboxes for the food with stickers on?

Sponsors

  • Read them out
  • As URIs on the site?
  • Are we tracking why they sponsor us?
  • Sponsor talking slots – 5 mins.
  • Not all sponsors want more than the event to happen

Diversity

  • Gender diversity an issue on Sunday
  • More designers at the event? We need more data users.

See you next year?

Handy tools for Open Data users

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

 

A session sharing handy open data tools that participants have built or found that might just make your life easier.

Google document for this session

Chris Gutteridge, University of Southampton

  • Prefix.cc – look up namespaces for RDF
  • Graphite PHP Linked Data Library – most of the RDF tools are written by academics who are clever, and assume that others are clever. Chris just wanted to build something easy – and that’s what Graphite is. It’s an easy way of exploring linked data. It makes it easy to debug the RDF code you create. The development version has a SPARQL interface, making it easy to build SPARQL queries.
  • Alicorn – a tool for generating pages from linked data.
  • RDF Browser – a quick and dirty RDF browser
  • Triple Checker – a tool to check for common errors in RDF Triples.
  • NTCat
  • Hedgehog – an RDF publishing platform

All of the source code for these is available on GitHub.

James Smith, ODI

The ODI tends to focus on simpler tools – and formats like CSV. Some much data out there is in poor condition.

  • CSVlint – a validator for data in CSV format, which also works with schemas. In alpha currently, and aiming for beta this year.
  • Open Data Certificates – a project to help people make assurances around their data, that gives others the confidence to build from it.
  • Git Data Publisher – a tool to help you publish your open data, guiding you through what you need to do.

Others

  • Gangplank – an open source data publishing platform

Crowdsourcing the perfect FOI request

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

Gaia Marcus has been working on a dashboard to show the scale of youth homelessness in the UK. It’s not necessarily just street homelessness – that’s fairly unusual in this country. There are plenty of other, more common, kinds of homelessness.

There are a whole range of problems with getting data via Freedom of Information, including tine restrictions, format issues and the dreaded PDF response. To counter that, they’re building a form for an FoI request that seeks to shape the request in a way that deals with those problems.

This is part of a campaign to get bodies to report this data better. They tend to both store and share it in a deeply static format right now – we need to get it in a more open, useful format.

The Human Factor

The discussion focused around dealing with non-expert FOI recipients, who need as much help as possible to produce a quick, useful response. Here’s some key points from the discussion:

  • How deeply should we ask for specific formats? Yes, we should probably ask them to return it in the spreadsheet we sent.
  • Excel versus CSV? Some preference for CSV, but there are good reasons for going Excel – familiarity is one, for example. Google Docs is out due to restricted access. Maybe Excel for those who can’t and CSV for those who can…
  • In extremis you could use SurveyMonkey or Google Docs to allow people to fill in the data directly for you. It does introduce a risk of human corruption of data – but that’s the risk at every stage humans are involved.
  • You should also specify that it should be published as open data on the website – and that saves you the cost of future FOI requests. There’s allegedly some research from Leeds City Council that their FOIs went down since they’ve started publishing Open Data. No-one here’s seen it, though.
  • In case of refusal, is capturing the reason why they can’t fulfil the request useful? The consensus seems to be “yes”.
  • We need to confirm the licence of the data – and ideally it should be Open Government Licence (and you’d need to link to an explanation of that). That way you could publish the data yourself, which you can use as part of a cost argument (fewer FOIs, because we publish this as open data for you).
  • Reference similar requests and highlight why what you’re asking for is significantly different.
  • Beware being classified vexatious by overwhelming authorities with requests.

Here’s the working document used to capture the FOI session input

Linking University Data – the open way

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

What should the university-centric data.ac.uk site be? It’s still a matter of debate, but in the meantime, Chris Gutteridge from the University of Southampton has it up and running while that’s resolved.

The starting point was a list of Universities – one more loved that the one on the UK government site. He strongly believes that every time you publish data, you should create something that ordinary people can use, not just the files.

The hub is really, really noddy – but it is a hub, and others can link to it. And that enables linked data around universities. They’ve been funded to the tune of £250,000 over two years. So what did they do with the money?

Open university data – so far

Equipment resourse

They built equipment.data.ac.uk – and they insisted that there was a contact link for whom you should tell if the data was wrong. They’re getting better at finding the equipment from the webpage – so they’re insisting that after the discovery phase, the equipment data should be auto discoverable. The bronze/silver/gold ranking helps motivate authorities.

They scan every ac.uk homepage once a week. If you’re not part of this at the moment – you can just add the data, and they’ll find it

University Web Observatory

They’ve built a web observatory, analysing how ac.uk domains use the web.

Searchable university news feed

They’re scraping the RSS feeds of the sites, too, to create a combined, searchable news feed of University information.

Purchasing codes

CPV Codes – could be incredibly useful for university purchasing information.

What next?

They have 2/3rds of the Russell Group involved – not because they believe in Open Data, but because they want their equipment advertised, and this is the easiest way for them to do it. But it acts as a trojan horse for the idea.

Next? Maybe university experts for media appearances. Hospitals ward data? Auto-discovery of that from hospital homepages would replace the idea of a central source. In fact. all of these distributed efforts mean that you replace dependence on a central broker whose funding – or interest – may wane.

Lincoln has developed an idea of a semantic site map, by marking up types of pages, called Linking-You.

“You can’t force people to use standards. you want them to embrace them because they’re better”

Open Addresses: Multiplying addresses

The team behind Open Addresses addressed Open Data Camp for the second time on Saturday in the day’s final session.

The ongoing-development of an enormous database of every address the UK, designed to be integrated into the National Information Infrastructure, began by introducing a couple of tricks they build upon their address-data skeleton.

They are using existing datasets not otherwise used for this purpose, such as those provided by Companies House. These sets can be extrapolated, with further addresses hinted in amongst the data.

Ze remote problem

The ‘Ze’ postcode in Scotland’s Shetland Islands has called the “black sheep” dataset since its publication in December as it only offers 117 addresses. It’s an example of how sparse house data can be, and demonstrates how particularly difficult it is to cover remote areas; big cities like London have more readily available address information.

On Fogralea road in ZE1, for instance, there are only two houses: number 5 and number 30.

Open Addresses has figured out a way of make an informed guess how the rest of the street looks.

They call it ‘inferring’ and it helps place houses 6 through 29.

Above or below those numbers, however, nothing can be done because the risk of getting the postcode wrong is too great; it’s better not have an address than to send someone to the wrong place.

How can you help?

This inferring process is supported by the organisation’s crowdsourcing efforts. The Open Address mobile app asks users to “help us improve the UK’s address data” by either confirming an address location or tipping them off to a potential problem.

User feedback helps asses the trustworthiness of an address, adding or subtracting certainty that the site is where it says it is.

Inferred addresses are automatically given a lower statistical confidence score, and potential outdatedness of old addresses sees it marked down.

The project invites users to challenge its data, with one member of audience gleefully noting the absence of a handful of Nottingham back roads on Open Street Map.

Open Addresses uses four sources to reference their data, none of which actually cross reference:

  • Ordnance Survey’s Locator for road names
  • Ordnance Survey’s Strategi for settlements
  • Wikipedia for towns
  • Office of National Statistics for postcodes

What’s the point?

This big data big mission is all mighty impressive, but is there an actual purpose? Open Addresses hopes to create a data commons of all the places in the UK; it’s a platform for technology that has yet to arrive.

With each home given its own URL, The package-delivering drones will know exactly where to go, how to get there, and if there’s anything specific to be done.

But it’s not just a future-tech thing, it’s also a solution to problems right here right now. The organisation estimates that between 50 and 100 million pounds are spent every year on comprehensive address lists.

Not only that, but those lists aren’t even very good; they’re managed by old institutions using old processes.

And there are some people who’s lack of address data is messing with their rights. One man, the Open Addresses team claimed, couldn’t register to vote because his house wasn’t on record.

But perhaps this super-database will cause some problem as well – what about spam?

“Er, that probably won’t get any worse.”

And, indeed, they have a longer answer to that.

Data literacy: What is it and how to promote it?

 

This has been liveblogged. Sorry for mistakes, they’ll be fixed.

The concept of data literacy is touted as the be-all-and-end-all solution to all information issues, but it’s pretty loosely defined, and may not be entirely viable for the wider public.

The ODcamp Data Literacy discussion on Saturday afternoon was challenged to define the term, and figure out all that it entails.

Does that compute?

To attain the sort of data literacy that can decode huge sets – interrogate and interpret – you require, to an extent, mastery of both a subject and computing. That second skill, the computing one, is from where most of the problems emerge. It’s not feasible to computer-skill up everyone, but an understanding of how to use data is pretty important.

Sure, data can be better designed, made more useable for the uninitiated, but literacy really comes into play when the brick wall of bad data is hit. The combination of field and computing expertise enables you to articulate what is bad data, why it’s so bad, and figuring out how to circumvent that wall.

It’s about asking the right questions, the group agreed.

The english-plumbing divide

But the extent to which “problematic” computer skills are required depends on how critically you view the whole thing. Data skills were described alternately as equivalent to both:

  • learning the english language – an absolutely necessity
  • learning the trade of plumbing – useful but something you’re likely to outsource.

Perhaps one to file under politics > being more engaged with the world

How to teach it

  1. School
  2. Citizens Advice Bureau for data
  3. Open Data Cookbook
  4. School of Data running data journalism classes
  5. Relationships – finding mentors or other tutors

The Open Data Board Game

The gamification of big data is what Ellen Broud and the Open Data Institute are exploring with The Open Data Board Game Project.

Broud with her Australian brogue used her Saturday morning session at ODcamp to crowdsource ideas for what data could be used in the prospective board game, how, and, significantly, the wider benefits of using it.

The room was asked to feedback what types of open data would:

  1. help to establish some sort of utopia
  2. best fit the board game framework.

She used the example of energy efficiency, with prospective gamers using information to achieve greater savings. The game would highlight to non-data geeks how open data is a really important thing.

First, the group tried deciding what kind of game it would:

  • Old school (as in actually on an board)
  • Digital
  • Augmented reality.

It was mostly agreed that such a data-driven game would probably be more at home on a device, though they also stressed how they didn’t just want to make another Sim City.

The Complexity Crisis

Next question: complexity. The clever data types in the room have an expertise well beyond the gamers they’re pitching to. So how do you take that expertise and translate it? Do you try for a one-size fits all? Do you have different versions for different subjects?

Broud recalled her struggles learning to code using the hard-to-understand Ruby Warrior. The board game shouldn’t be like that. After some hemming and hawing, the game turned out like a  crazy version of Sim City, but that’s not really a board game.

But what about the central thesis that the wall of ‘data idea’ post-it-notes was supposed to provide?

One audience closed the session by saying:

It should show that no data is bad. And that you should feel bad.

11 Horror Stories of Open Data

A cathartic session of data ranting, where Open Data Camp attendees shared their data horrors under the Chatham House rule:

Horror: A PDF full of screenshots

Looking for the location of fire hydrants? If you make FOI requests, you’ll be told they’re national security, or private data or… One council did send the info – but as a PDF. And in the PDF? Screenshots of Excel spreadsheets.

Lesson: Ask for a particular format…

Horror: Paved with good intentions

A government ministry was asked for its spending data, but had to be fought all the way to the Information Commissioner, because they argued that they had intended to publish, and that was enough to give them leeway not to publish. he Information Commissioner disagreed.

Lesson: Just saying “intent” does not let them off the hook

Horror: Customer Disservice

An angry Twitter user asking about his broadband speed was sent a huge dataset of broadband speeds by postcode, as a zipped CSV. And was a bit cross when he realised he couldn’t use it. So a member of the organisation helped out by creating way of reading it – and got told off by his manager for helping the public.

Lesson: No good deed goes unpunished.

Horror: The art of the inobvious search

Googling a list of GP locations, they found an NHS search service – no place to download it. ONS? 2006 data. It took her getting angry, walking away from the computer, and coming back and making a ridiculous search to find it. If you aren’t make it accessible, why bother?

Lesson: Just creating data isn’t enough.

Continue reading 11 Horror Stories of Open Data