Monthly Archives: February 2015

Building an Open Addresses database – and opening its APIs

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

Gianfranco Cecconi & James Smith

Open Addresses are trying to build a huge addressing dataset from scratch, fighting the monsters and competitors that involves. They believe that addresses are a key asset of the national information infrastructure – and we need to liberate those addresses – or that was the pitch to the Cabinet Office.

The problem is huge.

They started with the assumption that they could build their dataset from existing open data sets, that (by chance) have associated address information, without intellectual property issues – and a volunteer workforce would then develop it from there. The Royal Mail suggests that there are 60m addresses in the UK – but that’s delivery places. This project has a wider view of the idea of addresses. Your electricity meter or your drone delivery spot might be an address.

Surviving as a non-profit

They also need to survive financially. They try to be frugal – so they try to not get sued, but they also try to build services that can fund what they do. The early money from the Cabinet Office will not last for ever. They have APIs that you can use in your products and services – for free. But there will be value added services on top of that. For example, “give me a likelihood of how real an address is”. It’s not a trivial problem – but could be very useful for delivery services.

There is no UK master list of addresses – no gold standard. Everyone is working to build their database, and all have errors, but some are further ahead. Confirmation is needed on these addresses, and Open Addresses is built to deal with this doubt and uncertainty as they go.

While they do need money to survive, many of their basic services are free, because they need to be there.

Working with the Open Addresses API

The obvious thing: search the data. And that you can do via the API. Just three lines! But the completeness is limited right now – they only have 1.2m of those 60m addresses. You can submit addresses through an API called Sorting Office. Again, free for now. They’ll normalise the address for you – and you can donate it to them, but you don’t have to.

With informed consent from your clients, you can hand over addresses to us on a day to day basis – through Turbot. It’s a platform for managing scrapers, and is descended from ScraperWiki. (It went live last night – 20th February 2015.)

Want to more sophisticated analysis on a block of text with addresses in it? The address building blocks API allow you to perform detailed analysis and processing on that sort of data. That is likely to be the main source of revenues in the battle to survive. The confidence API will be made available, giving a confidence score on any address.

Building the database

Their biggest challenge ahead of them is building the addresses. There’s a privacy issue – and persuading people that sharing addresses is not the same as sharing personal information about yourself doesn’t really tell anything personal. The existence of an address is not personal information, it’s just a fact. You can walk down streets and write them down. But it feels private.

There’s also a corporate approach, working with companies that use addresses, but they need explicit permission from clients to share their addresses.

Further notes and links.

Open Data for Charities – opportunities and roadblocks

Warning: Liveblogging – prone to error, inaccuracy, and howling affronts to grammar. This post will be improved over the course of a few days.

 

A session about use of open data by charities, inspired by the Data for Good report from Nesta.

Tracey Gyateng from the NPC is helping charities measure the work they do. How do they know, for example, if offenders stop offending after their work? Can government datasets help that? They think so, and are working on systems to help do that.

They also work with charities to open their eyes to the potential of data. Some of the rhetoric is around making money from data – but it can be used for charities to improve the well-being of people.

Many charities don’t know much about open data – or have the understanding to know how to release or access it.

360 Giving – an emerging data standard for grants. Policies are beginning to be published on Github to allow people to access them more.

Breaks in the data supply chain

Like the food supply chain, the data supply chain is broken. There’s no opportunity to thank the farmer that grew the supermarket food you bought. The same is true of the data flow in charities. You give to Comic Relief or the like, and there’s little feedback of what your money ends up doing, bar the few the film for the following year. We can engage the citizens that volunteer and donate more.

80% of charties have less than £100,000 in income – so it’s important to keep focused on that.

Mobile sensor feeds could be useful – combining sensor data and open data could be very useful. There are various projects underway on that.

Even experience with data is not as much of an advantage as you might think – problems with formats and understanding its nature can be difficult.

Charities: big and small

Are the challenges of open data for small charities and big charities different? One participant thought so, another suggested that if big charities lead, small charities can follow from that. But the University of Southampton research suggests that for small charities it’s much more about delivery than engagement.

Citizens Advice has had a lot of help from DataKind to help analyse their resources. They’ve produced some useful models, that smaller charities could use.

Local organisations often don’t think they have the time or organisation to collect data other than that required by contracts or law. As organisations, you do have data and information about your area that you could be sharing. The biggest problem is breaking the barrier of the procurement mindset: they are procured for that service and that service alone.

It would be great if the bigger organisations took on this modelling and passed it down the chain. So many of the small organisers are scared of the big funders and doing things they weren’t paid for.

Continue reading Open Data for Charities – opportunities and roadblocks

Banking on the Open Data Camp

I am bringing Open Data Camp a big fat data problem.

How many young people are homeless every year?

Whilst Centrepoint estimates that the figure is 80,000, the truth is, no-one really knows.

Government and hundreds of organisations are working to improve the situation for young people experiencing homelessness. However there are currently no ways to collect, track and measure the work being done on a national scale. We often operate in a vacuum, not knowing how many young people are homeless, or why; or which interventions are most effective.

Youth Homelessness Databank

We aim to change this. Centrepoint have recently won a grant from the Google Impact Challenge to create the UK’s only Youth Homelessness Databank. The Databank will collect and collate data from multiple sources:  the homelessness charity sector, local authorities, central government and other open datasets.

Holistic picture

Analysis and visualisation of these data will give us a holistic picture of the scale and causes of youth homelessness; and of the range and effectiveness of interventions. This will lead to a greater understanding of what works, better services, better funding decisions and ultimately better outcomes for young people experiencing homelessness.

What is an ‘ambitious and pioneering project’ on paper is a festival of moving parts in practice. My big fat data problem –

can we piece together data from the homelessness charity sector, local authorities, central government and open datasets to understand which young people experience homelessness, why this happens, and what works for them?

  • breaks into a thousand cuts of questions and poorly aligned data sets.

Help!

So I’m banking on #ODC to help me answer some of the following:

1. Data Flows

I would like to understand how youth homelessness data flows around the country.

Data somehow whizzes from beneficiary to assessment to beneficiary to provider to local authority and/or funder to DCLG to live data-table…. Can we map this journey?

2. FOIs.

Yes, FOIs. We have questions, me may have to FOI them. What works and what doesn’t?

3. Critical friends

Who are the critical friends within local authorities that we can talk to for the inside scoop? FOIs do not a collaborative project make!

4. What Databank can do for you?

I don’t ask what you can do for the Databank, I ask what Databank can do for YOU!

The UK spends up to £3.2 billion a year on youth homelessness. As we approach the impending fiscal cliff, what can the Youth Homelessness Databank do for YOU?

5. Systems talking to each other

Getting client management systems to talk to each other.

Any tips? On a post-it please!

Looking to the future!

Beyond #ODC, the Youth Homelessness Data Bank wants to hear from you –your contacts, your ideas and expertise on what data we should be collecting, which services/agencies we could be requesting information from and how we can offer young people experiencing homelessness opportunities to be involved.

Be in touch!

Contact me on Twitter: @la_gaia

Open Data and auto-discovery

Hi, my name is Christopher Gutteridge, I work for the innovation and development team of the University of Southampton, created the first version of their open data service data.soton.ac.uk and am one of the founders of data.ac.uk </bragsheet>

For a long time I’ve been interested in open data from organisations. Each organisation owns its own data but there’s lots of value in many organisations publishing similar open data in similar ways. Your organisation isn’t special it almost certainly has some of:

  • sites, buildings, rooms, desks
  • people, teams, departments, job roles
  • key webpages: contact us, search, freedom-of-information, message from the boss
  • a product catalogue
  • places (physical or online) where you can get a service which may have opening hours and specific offers of a service at a price, from coffee to brain surgery to car parking
  • research outputs or publications
  • social media accounts
  • news and notices
  • events

The exact data you store or publish about these things may vary (this includes the links between things, eg people-in-buildings). However, the basic concepts should be the same for many organisations and we’ve been looking at ideas around how to share this information without the need for Google or Facebook to act as an intermediary. The schema.org route is cool, but it doesn’t solve the problem I want to solve because web crawling embedded data isn’t the best way to get a dataset. Also, there’s no trust that data found by crawling

http://www.badgers.ac.uk/jeff/ is really official information, and not just a demo but Jeff the PhD student.

At data.ac.uk we have created a simple mechanism to discover such predictable information sets from an organisation from the web homepage. We are using this to autodiscover lists of research equipment in the UK academic sector and it has proved both effective and cheap (sustainable) while protecting the community from the risks normally associated with a hub that collates data suddenly going away. At the time of writing, 16 organisations, including 5 of the Russell Group, have implemented the OPD (organisation profile document), which is basically an auto-discoverable FOAF profile in Turtle which also describes the information sets an organisation has. While we’ve piloted this technique, it is by design anarchistic — anybody can expand and add to it. I want a web of data which doesn’t require silicon valley heavy hitters to let me work with open data.

Oh, there’s also equipment.data.ac.uk which now has open data from 40 contributing ac.uk institutions. Actually, there’s a whole lot of other datasets: http://www.data.ac.uk/data

I’ll be attending the Open Data Camp on Sunday and I’d love to tell you more about our work, either one-on-one or maybe in a session.

cjg@ecs.soton.ac.uk

@cgutteridge

@dataacuk

OPEN DATA CAMP – FEBRUARY 2015

This post was originally published on the Trafford Innovation and Intelligence Lab web site.

[Note: There are words in this post that I know are silly, but I have to use them. I am highlighting these words in italics to show that I know they are silly, but I’m using them anyway.]

I am currently helping to organise an event called ‘Open Data Camp’, which is to be held in Winchester (it’s near Southampton), on the 21st and 22nd February 2015. We think that it’s definitely the first of its kind in the UK, and possibly the first in the world (or even the universe, depending on which side of the Drake Equation fence you sit on). The 21st of February also happens to be International Open Data day.

Open Data Camp is a two-day event, consisting of an unconference and maker-space. The focus of the event is entirely open data – the notion of making data available so that it can be reused by anyone, without any restrictions. Though the event is an unconference (which means the content of the day is decided by attendees at the beginning of the day), it is likely that there will be sessions looking at the National Information Infrastructure, technical challenges, and opportunities presented by open data, amongst lots of other things.

Who is doing this?

The campmakers are a ragtag group of open data people:

Mark Braggins (Hampshire Hub Partnership)
James Cattell (Cabinet Office)
Neil Ford (Events)
Hendrik Grothuis (Cambridgeshire County Council and Open Data User Group)
Martin Howitt (Devon County Council)
Lucy Knight (Devon County Council and LocalGov Digital)
Pauline Roche (Birmingham)
Giuseppe Solazzo (Open Data User Group)
Sasha Taylor (British Association of Public-Safety Communications Officials)
Sian Thomas (Food Standards Agency)
Jamie Whyte (Trafford Innovation and Intelligence Lab and LocalGov Digital)

Open Data Camp also has a number of excellent sponsors, without whom it would not be happening:

Hampshire County Council
Open Addresses
Drawnalism
Food Standards Agency
NquiringMinds
OCSI
Office for National Statistics
Ordnance Survey
Swirrl

Why are we doing this?

We are a group of people who are passionate about open data. We really feel that by opening data up, good things happen. There are many events held where open data is a supporting cast member – but at Open Data Camp – it’s the star of the show. To bring together 200 people for a weekend who are into open data is a brilliant opportunity to push open data forward.

Why are Trafford doing this?

Trafford has a history of doing open data well. We worked on setting up DataGM, we were the first Local Authority to be awarded a Pilot Level Open Data Institute Certificate, and we have recently been asked to work with the Cabinet Office as Local Experts in Open Data – working with a handful of other Councils who also do it well.

We use open data, as well as releasing it. We have recently used open data to identify priority sites for positioning defibrillators, apply for funding to support projects to reduce isolation in the elderly, and combined open and closed datasets to analyse cervical cancer screening rates, amongst many others.

Because of this, we have a vested interest in the wider open data picture. The more open data is released, the more we can use it to provide intelligence – through analysis and benchmarking. The better our intelligence is – the more informed our decision-making is.

But apart from the benefits that more data brings, there’s another good thing that’s happening because of the camp. The open data community is exceptionally talented, but is quite thinly distributed across the globe. Open Data Camp is being used as a touch point for some of these groups and organisations – the camp itself is now looking likely to connect with the Open Knowledge Foundation hack in London, Bath:Hacked, Greater Manchester Data Synchronisation Programme Lean Startup weekend, Ebola Open Data Jam, and ODI nodes. The mechanics of these link-ups are yet to be worked out, but the fact that these connections are forming is very good for the open data movement.

How can you get involved?

All the tickets for Open Data Camp have now been sold (or rather allocated – it’s a free event). I will blog about the event once it has happened, with outcomes, outputs, challenges, etc. We (thecampmakers will be tweeting in the run up to the event, and during the event itself, using the hashtag #ODcamp. All campers will also be asked to tweet during the event. We are also looking into ways that we can livestream sessions – more details of that will be available on the website.

Finally – if the camp is a success, we’ll probably look to make it an annual feature. If so I’ll do my best to drag the next one up North. Don’t be afraid to get tickets and come along!

Linking open data

We’re happy to be sponsoring the first Open Data Camp UK and we’re looking forward to hearing, and seeing, what people are doing with Open Data. To us, as data publishers, the best thing about opening up data is the freedom it gives you to create something useful.

But if you link your open data the possibilities really open up. So, in that spirit, this post is about what publishing Linked Open Data really means and some of the practical advantages it has.

Linked Data is:

“a method of publishing structured data [on the web] so that it can be interlinked and become more useful.”

With Linked Data, each data point (i.e thing or fact) has its very own URL on the Web. This is unique and because it’s readily available on the internet, people can look it up easily. And Linked Open Data can also contain links to other facts, so you can discover more, related data.

The linked data “cloud”

But Linked Data also rocks if you want to make something with the data. This is because when you look up the linked data page, all the metadata about it is embedded in: so there are no ambiguous column names to slow you down.

And if data is published as linked, as well as being published on a web site, it means that it comes with APIs, including a SPARQL endpoint – so developers can query the data in a variety of formats and use the data in their own programs.

But it’s not just for the techies – if you’re not technical, linking up your open data has other advantages.

  • It makes it easier to work with open data across organisations and departments because it’s not locked into silos: anyone can access it, making it truly open.
  • Linking open data with other data sources and having specific names for things saves time and effort when problem solving. Take a look at Steve Peters’ post on Joining The Dots across departments.
  • It’s low cost and sustainable – you convert the data once and reuse it – again and again. As part of our PublishMyData service, you can update your data yourself.
  • By linking your open data, it makes it easier to create apps and visualisations which are a friendly, quick way in to the data.
Swirrl’s event space at Manchester’s Museum of Science and Technology

And on 21st April 2015 we’ll be sponsoring an event of our own: Local Open Data:Reaping the Benefits.

This is a one day event at Manchester’s Museum of Science and Industry. Its aim is to bring together people working with, or interested in, data at a local level.

You can check out our awesome speakers here, or register your interest.

Photo credit

The linked data cloud features in the Wikipedia article: http://en.wikipedia.org/wiki/Linked_data and is attributed to Anja Jentzsch