Monthly Archives: July 2024

Open data man


Terence Eden, who runs Open Ideas, publishes data about himself. “I have published some of my medical scans, and from solar panels at my house,” he said.

Who else publishes information about themselves? In some ways, everyone in the room. LinkedIn profiles. Blogs. Pictures on social media. Strava. Sports results. Participation in challenges. More accidentally: location data from phones. And that data will reveal other things. Jobs. Where people were (or are). Where they bought some of the things they are wearing or making.

However, a lot of this data will be curated. People will post the jobs they got, but not the ones they didn’t. They might record good run times, but not bad ones: particularly on apps that use league tables to encourage gamification.

Terence said one of the reasons he had decided to consciously post some data about himself was to provide a more complete picture. Posting his scans prompted “a good response” from people with similar issues. Posting solar panel information counters the argument that they only work when it’s sunny.

That, he said, is a good thing to do, because it shows what’s normal. “It is nice to have non-outlier people posting stuff,” he argued. “If you only get the king of the mountains posting on Strava, it can distort people’s idea of cycling.”

Of course, there is some deeply personal information that most people won’t want monitored. Although it may be collected – Terence pointed out that many offices will have sensors that detect whether someone is at a desk. And people’s idea of where the limits lie will vary. Some people who use diabetes monitors post their readings in social media.

Important data sets, from haircuts to school photos

After a quick stretch and move around, the session moved on to discuss what kind of personal data people might like to know. Haircuts, Terence suggested.
It might sound random, but the trivial data in Samuel Pepy’s diary is fascinating. And the number of times people get their haircut might tell us something about economics or social trends.

At a personal level, apps that record music downloads or beer provide a record of what we listen to and like and how this changes over time. Data collected by apps like Zoe with its Blue Poop Challenge track health over time.

Sometime’s, it’s obvious that this information will only be of interest to the individual who collected it. Sometimes, it isn’t. When newspapers publish pictures of children starting school, they are wanting to sell copies to parents and grandparents. But in 50 or 100 years, these might be valuable social records for researchers. Who knows when personal data gains public value?

Next steps for the Open Data Community

There’s a feeling at this camp that we have an opportunity to influence a new government. There’s hope. How do we build on that?

Following Up Open Data Camp 9

Events

  • People want two-way discussions, not talks.
  • Another Open Data Camp next year (but not July, please)
  • Crowdsource a list of events that are open data adjacent
  • There are government conference in related field in November. Could we get people on panels there? There would be an opportunity to meet there.

Social networks

A LinkedIn group? People aren’t keen on using X, so maybe LinkedIn is worth trying. We’ve tried Slack in the past, but it hasn’t worked.

Perhaps a forum? It works for some organisations, and you could set up strands for different discussions. There was some enthusiasm for the idea. What could we put on it? Local events organised through the forum. We need community management to make it work. The UK Open Government Network has an existing forum we could use.

Blogging

Owen blogs consistently. What about the rest of us? It can be delivered as a newsletter as well.

Feedback

  • Could we capture ideas and action point on a form linked from the follow-up email? There are data holding issues around this.

Towards Open Data Camp 20

What do we want by Open Data Camp 20, in a decade’s time? Do we require an action plan for that? Or do you want to remain just a series of events?

How healthy is the community in the UK now? Civil servants believe it to be strong — but are they talking about the Open Data Institute? But the community feels that perhaps it isn’t as strong and well-connected as it would like to be.

There are people out there who are part of the open data community who aren’t here: academics and journalists, for example. We are a self-selected group, and the community is wider than us. Part of the problem is the lack of an agreed communication channel. Maybe some people would join a forum who wouldn’t come to an event. The Open Data Café attracted a slightly different crowd.

Who has gone quiet over the years? There are people who have drifted away. There’s been a shift towards expertise and facts-driven governance. Open data is a tough rock to roll up the hill. Some people have stepped away from self-protection.

The community needs to coalesce and kick off the network effect — and then we might see things happen. We need to sustain the energy levels from the camp throughout the year. More events, lunch’n’learns. Who is writing good stuff on blogs? Who’s talking at conferences? Where do find people find support or critical friends to bounce ideas off?

Immediate Actions

  • Workshops for getting public and private sector people together better. That’s a discussion that needs to carry on.
  • There has to be a way of sharing anything we learn in the next few weeks about the attitude of the new administration to open data. There’s some sensitivity that people will sometimes need to be non-attributable.
  • The National Data Library in the Labour Manifesto indicates that they know data will be important in the government’s plans. But we don’t know where it will land, or what it will be. But we need to find out where it lands and connect with these people.
  • Prepare of the National Action Plan consultation in 2025
  • Engage with the Smart Data Council
  • Can we capture in one place all the organisations in and around this space? A “Who’s Who of Open Data”.
  • A mini Open Data Camp — ODCampX
  • Make sure that the idea that the moment is now in the email follow-up.
  • Get some event or meeting in the calendar before September and the return of Parliament

A register of information asset registers. Good idea? Bad idea?

After lunch on day two, delegates at Open Data Camp 9 had the chance to explore more of Manchester on a Joy Diversion walk around the city. Yet Martin Howitt still had a good turnout for a session on asset registers. “Welcome,” he said. “I expected to be on my own, so I’m pleased so many people are here.”

Asset registers at Open Data Camp 9
What is an information asset register, he asked, for anyone new to the issue. Basically, a list of data sources. Every public organisation should have one. However, when Martin “got bored” a few years ago and put out a Freedom of Information Act request for registers, some 70% of the organisations he contacted didn’t respond.

Since then, things should have improved in local government, where there is funding and a structure to publish an information asset register. But what is people’s experience? Are registers being published, and are they useful?

Participants suggested it’s variable. Some organisations publish an information asset register, but some don’t. Where there is a register, it may not be up to date. It may list data sources, but not get into how they can be accessed. Conversely, it may contain too much information. Martin said he contacted one council for its data asset register, and got one that included lots of personal identifiable data, such as people’s names.

What’s an information asset register? What’s a data asset?

Given this variability, what would improve things? One idea is to assess the maturity of registers; but a speaker with experience of working in a government department suggested it would not be worth putting in the time and resources to get to the highest levels of maturity, given other priorities.

Another ideas is register of registers. “I can see a utopia where everybody is submitting an up to date one to a central registry, which would show us what was available,” Martin said. “But we are never going to get there, are we?”

It would certainly be difficult. As things stand, there are no standards for an ‘information asset register’. In fact, there isn’t really an agreed definition of a ‘data asset’, and the EU legislation that underpins the requirement for public bodies to publish data is incredibly broad.

Another question is whether a simple list of information assets is, in fact useful. “If you had an information asset register from every department, then every department might say they had information on ‘houses’, but the Treasury would know about tax, whereas DEFRA would know about floods,” a speaker pointed out. “That’s another level of detail.” But probably what many users will be looking for (in which case, what they really want, is not the information asset register but a data catalogue).

Generally, the session felt a better way forward would be to encourage government departments and councils to work with the users of their data. Finding a way to automate the production and updating of information asset registers would help. As would finding a way to automatically inform users of changes.  Also, speakers suggested, search is improving and LLMs may make it easier to find what’s out there.

“I am not getting a sense from around the room that this is certainly something everyone wants to see,” Martin concluded. And a speaker agreed that “I think it is going to be a lot less useful than you hope, because of all the context that would need to go around it, so you’re still going to need to talk to the data holders.”

The wasted potential of Geospatial data

Geospatial data is about mapping data points to specific locations, but the sheer power inherent in that statement is being under-exploited.

DEFRA publishes a lot of geospatial data, in near real time, for things like the air quality index. There’s also lots of geospatial data around fisheries: for example, monthly languages.

Geospatial data formats

However, data quality can be a problem — catches allocated against points on land, which is clear wrong. Because most of the time, people don’t visualise their data on a map, things like this get missed. For people, data often means Excel or CSV formats. But these aren’t really designed in any way for geospatial data.

So, they should offer data in format like geoJSON. It’s one of the principles of open data to offer data in more than one format. People who are using ArcGIS prefer shapefile. But people working with Java will prefer the JSON format.

The privacy problem with geospatial data

Think about energy performance certificates. If that data has a geospatial element, it becomes much easier for councils, for example, to visual where there are energy performance issues in their area: this row of houses, or this group of companies. However, there are challenges here — the performance of an individual household feels like personal data. There’s discomfort there in sharing that information.

Could you average the data from a group of properties to disguise the performance of individual properties? Strava has taken this approach to publish data about people’s activity without making it too personal. You’re reducing the resolution of the data to preserve people’s privacy.

In theory, you could start to use fuel poverty data, and credit reference data, to start identifying areas for intervention.

Using geospatial data for conservation

And fishing? Why publish monthly, when you could publish it weekly and daily? That could allow us to understand the impact of fishing activity in near real-time.

There are other sources of data, like crowdsourced citizen science projects. You can get people to report whale and dolphin sightings, with photos. You can analyse the geo data in the photos to start platting the presence of these creatures on a map, once you identify the creature from the photo. This can be simplified via an app upload from people’s phones.

There are so many ways we can use proper geospatial data for things like ocean conservation — it’s a big sustainability issue. But unless more people commit to publishing good geospatial data frequently, our knowledge of what is going on will go backwards.

Real time geospatial data is even better. The dolphin/whale example is just one project that would befit from this. Real-time decibel maps would be great for addressing noise pollution issues.

Mission Control for a mission-driven government

If the new Labour government really wants to be mission-driven, it’s going to need data to tell how well it’s doing on those missions. This could be a Mission Control for the government, and the allusion to the space race is intentional. It’s that sort of delivery-based=, inspiration approach to delivering.

This workshop aimed to collectively work out what we need on those Mission Control screens:

  • What data do we have, that we need to open up?
  • What data will we need to be collecting?

Growth

Capture from Mission Control: A mission led Goverment - with Five Priorities - Economic Growth at Open Data Camp 9

Clean Energy

Safer Streets

An NHS fit for the future

 

Opportunity for all

 

Sustainability: how can open data help?

John Spanton from Valtech.

“I was keen to run a session on sustainability and how open data can help. I thought I’d start with a bit of fun.” [Slide: a set of stripes – actually, temperature change in Manchester over time, showing it’s getting hotter!]

Who else was in the room? Researchers and companies looking at sustainability in different sectors, from transport to universities, and local authorities to energy. John asked them about their experiences of working with open data.

A speaker from the transport sector said they used open data to try and give individuals an idea of how much carbon they could save, by going by rail rather than driving. He felt the calculators were quite basic, but another speaker said she loved them and “find them super-useful.”

Another good example is the Surfers Against Sewage website, which tells people about discharges, and has a crowd-sourced element, because people can report being sick. “It allows people to visibly see a problem,” one speaker said. Although whether it enables them to do much about it is another question.

John said this raised a couple of interesting questions. How detailed and accurate does information have to be to be useful? How can we get information from private companies to improve sites and calculators (especially if the data is going to show them in a bad light)? How can information collection be standardised and automated. And how do we fill in gaps?


When it comes to gaps, speakers argued there are some opportunities around. For example,  John said he’s keen to install sensors in his house, to show how much energy is being saved by his new heat pump system. While another speaker took this further, by arguing it should be possible to create digital twins for buildings, to show people the impact of, for example, putting on the air conditioning, or using natural lighting.

However, another noted that it’s important to make sure the monitoring itself is sustainable. “I see more and more things being monitored, and lots of these monitors have lithium batteries, and do we really need them in every building?” she said. “Perhaps we just need a cross-section of buildings to give us some insight into what is good enough.”

Is data always useful, or just depressing?

That issue of what, if anything, is done with the data is also key. Calculators, such as Fruggr, use open data to enable individuals and companies to assess the impact of their IT operations – or “digital pollution.”

But a speaker argued they’d only be effective if businesses “see this as core to their operations” and not as something off to one side of them. Perhaps, the session suggested, there might be drivers in a university seeing how much energy it could save by doing things differently, or a landlord having a new tool to market an efficient building?

Another speaker who works in sustainability noted that lots of effort is putting into informing individuals about climate issues. But “this can be quite depressing” if “the seas are still full of poo” and temperatures keep on rising. “I am sure there are good news stories,” she said.

So, perhaps one thing to do is to go and find them: looking to other countries, if necessary. Training courses, such as Carbon Literacy, can also be useful to put different issues in context.

To conclude, John said Valtech has just started a blog series on “trailblazers and disruptors” to address some of the “negativity that is out there.” Because it’s not just finding data, but using it to drive change, that matters.

Refreshing our approach to government transparency

After the efforts of the 2010 government to open up government data, commitment to transparency has steadily declined. Does a new government offer the opportunity to revise this?

The public sector doesn’t always release the data it should, there’s a lack of consistency, and when they do invite time and people it may not then end up getting used. There’s an opportunity now to decide what data should be shared, what shouldn’t — and to improve transparency in government. It’s a problem with open data across the board that we don’t know how it’s being used. Data isn’t considered a product of central or local government, and there’s no direct monitoring of use.

Is there a burden on local authorities in doing this? Usually, there’s very little extra burden. There’s a balance to strike between the burden of producing it and the usability. It can be difficult to figure out what money has been spent on based on the data available. There are some codes that are penetrable if you know the organisation, and some you have no chance. However, trying to apply standardisation might just lead to a complete shutdown.

Standardised data capture

It might be better to standardise how expenditure is logged across government, and then it becomes easier to make the information available in a more transparent way. Currently, people aim for three star, machine-readable data because it’s not too burdensome to get that out.

How often have we had this conversation? There was a consultation on transparency that didn’t really go anywhere. What action could central government take?

There are more basic things, like skills and data literacy that need to be tackled in local government first before you could standardise. At the moment local government producer their own systems, and people can’t see any central effort to standardise systems and vendors. But you could apply some technical standards — your system must be able to output data in this format with this schema attached to it.

Procurement problems

A chief planner won’t understand the database issues involved in choosing a new planning system, of the export formats needed. Setting those technical standards centrally gives them something they can use when buying a new system.

Central government could also send stronger messaging about transparency. This could be a straightforward way for them to rebuild trust in politicians and civil servants. We need to look all the way back to education, looking at how data fits into the landscape, and aim to have people as comfortable using data as they are using Windows.

Digital isn’t working well in government right now. The problem is less that it’s burdensome, although machines can do a lot of work, it’s that we need to improve the quality of what’s being published.

Right now, really uncontroversial data standards are just bouncing off the procurement systems. Unless they can prove cost savings in the next few years, they just bounce off. We have a five to 10 year replacement cycle, and everyone’s scared of having a big IT disaster. Labour aren’t going to magically change all that. We need to have a different sort of conversation; open data for open data’s sake is going to win nothing. We’re nice people and they lice us, so they’ll give us some time. Not only that, but we have a window of opportunity, but we have to find a way of using it.

Tying open data to the government agenda

We need to tie open data to some of the major agenda items for the new government, like economic growth. Or if local government can argue that it will improve delivery of services. The last Labour government dropped the ball in this — we can remind them of this. This government is a bit more open to ideas around this. But we need to use their language and reflect their ideas back to them. There’s nothing wrong with writing lobbying letters, and we really want some of the community brought on board as experts. This is our opportunity to really crack it, and we should be leading with hopefulness and a spring in our step.

The National Action Plan process is worth looking at — the 7th one is coming up, and it will be Labour’s first. We’re in the very early stages of thinking about it now. Can we get it in this new cycle and then hold politicians to account?

There is interest in central government in just surfacing what is going on. Devolved administrations have data that is useful at a national level, like ticket purchases. It would be useful to have some of that data standardised, so that if one area takes over another, the data matches. Right now, data gets sent to central government, cleaned, aggregated and sent back out. And it takes 18 months, so the data is always out of data.

More legal obligation?

What if local authorities were legally obliged to be transparent about their data collecting infrastructure? Then they will be competing against each other to look better. Local government people are trying their best, though, and they already feel like the poor cousins of central government. And it’s difficult when you acquire a new system that claims to be standards-compliant, and then proves not to be. You’re in the contract then.

There’s a danger that if you mandate a complex system that publishes to open standers that some important dat just won’t get collected because it’s too hard. For example, there was some really useful data generated about homelessness because governments anted to measure its impact, and they could only do so by literally having people counting the homeless son the streets.

One area that would be really useful would be money flow information, and not just from local governments, but NHS trusts, and the fire service, and so on. When money comes from the treasury, where does it go? We also need to reduce the dependence on consultants, and get the skills needed for this dats work within government.

Public/private sector marriage guidance

“Do you work in the public sector, and feel that business just doesn’t understand you?” asked session pitcher Jez Nicholson from Pororoca. “Or do you work in business, and feel unloved by the private sector? If so, you need to come to my public/private marriage guidance session.”

It’s not you, it’s me (ok, it’s you)

This was a session about the relationship between the providers and users of data, who tend to sit in the public and private sector respectively. As Jez said: “Sometimes, I wonder if we realise we are in a relationship.

“And I wondered how other people feel. Do we publish data, and they pick it up and run with it to create growth? And we’re all happy? Or do we feel they take things, and abuse us a little bit?”

One speaker argued that in order to answer this question, it might be useful to think about some of the models for publishing data that are out there. Some organisations just publish, and don’t worry too much about what happens next.

For example, TfL publishes its timetable data, without worrying about platforms and apps, and leaves others to come up with ways of using it that benefit travellers. While the ONS publishes “loads and loads of information about what is going on in the country” – from the census to the Labour Force Survey. And companies make money from that, by turning it into intelligence for business, or reports for think-tanks or councils.

But other data providers want to recover their costs. Or at least see a share of the return that others are getting. That might feel “fair” to them, but not to the small businesses, community interest companies, and charities, which have to spend a lot on data and the infrastructure to use it, before they can start doing what they want to do with it.

“It is,” as one speaker put it, “a complicated dynamic.” Or, as another put it, at worst “you get these pathologies” where government departments are told to do things, like publish open data, but not funded to do it, so they don’t do it consistently.

Or there are charges, even when making data freely available might deliver more economic benefits (as discussed earlier at Open Data Camp 9, Royal Mail charging for the post code data file is a good example).

On the up side, another speaker suggested, the partners are at least talking about their problems. “Data is part of the conversation now,” he said. “We have a transparency code, and people expect numbers to be part of that. So I think it is changing a little bit.”

On the down, that might be hard to sustain, given the ongoing squeeze on public funding. After all, it’s not unusual for marriages to come under pressure when there is very little money around.

Let’s find a way to talk…

Session participants agreed that one way forward is to set up a dialogue. Departments that publish open data need to be clear about what they are collecting it for, and how it is validated.

Businesses need to be able to talk to them about what they need; and, one participant suggested, more willing to give back by providing feedback, blogs, or case studies on the benefits they have achieved; and another argued, be willing to publish more of the data they hold.

Sian Thomas argued that now is a good moment for conversation, because there is a new government in place, that will need to set its spending priorities. However, one participant suggested there’s a shortage of suitable forums: established routes for business to talk to government, like Chambers of Commerce, or Local Enterprise Partnerships, don’t tend to work for modern, data driven businesses.

Others noted that Open Data Camp is a great route for people to come together and share ideas. But there’s a “gap” when it comes to taking them forward. As Jez put it: “It feels like we’re leaving notes for each other in the kitchen.

“There needs to be a way to systematise it. Because just bumping into people on an ad-hoc basis, doesn’t seem to be a route to life-long success.” Perhaps, a speaker suggested, the final session of the day, looking at how open data can make use of the next 98 days of Labour’s first 100 days, will come up with some ideas…

Blog post of note

Shaping the Data Marketplace through User Research – Central Digital and Data Office

Open Data Camp 9: The Pitches (Day Two)

Day two of Open Data Camp 9, and the crowd are a little subdued after a lively night last night. But pitching has to happen, so let the ideas flow…

The Pitches

This post will be updated with links to sessions we liveblog.

AI and the open web: what the hell is happening?

Is any web content basically freeware that people can do anything they want with? That was the position of the Microsoft AI CEO Mustafa Suleyman.

How do Open Data Camp 9 people feel about this?

For some people, it wasn’t clearcut. What’s our responsibility around using and credit content on the open web?

Others have been through this before — having to educate people about the copyright status of web images. (It remains copyright of the creator, unless it is given an explicit language). The problem with the approach of many AI companies is that they’ve already done it.

The ethics of AI training

Is it unethical? Some people think so — and they think it’s unfortunate that big corporate players using content this way normalises it. But how do you resolve this after they’re already ingested millions of points of data? The horse has long bolted.

Should we be actively trying to steer people towards the tools that have trained their models ethically on data they have legal permission for? Even they are facing pressure to use more data. There’s a desperation to catch up with the market leaders and ethics is getting lost on the way. But, if what the market leaders are doing is made illegal, they can’t do it any more. So we could start to address these issues. If we stop it being possible to scrape the whole internet, the edge you get from doing that disappears.

However, we’re talking about huge, rich and powerful companies here. They can probably delay legislation for years through lobbying and legal challenges. One attendee has been on the receiving end of that before. You need to differentiate between a corporate being wilfully ignorant of concepts, and being unintentionally so.

The permissions you gave for your data a decade ago make little sense now because you had no idea of the sorts of tools that couple be applied to it now.

Have we already lost this battle? Some people thought so.

Is using AI risky?

Even personal productivity tools like Microsoft’s Co-pilot could be training itself on corporate data that the user doesn’t actually have permission to use. Will it amplify the biases of staff members? In which case, where does liability lie? Can they guarantee that the data will stay, for example, within the UK? No.

There’s a push in artistic circles to poison their work against AI training. If this spreads, will companies start pressuring politicians to allow them to use the data they already store for companies, as long as it is anonymised. Certainly, people can see that happening.

There’s an inherent tension between the open data community, which is a loose aggregation of people seeking public good through opening things up, and then big corporates seeking competitive advantage.

But some people are content creators too, and their work is being taken and used without permission and compensation. What might it take to get the big AI companies to step back from this? It might be something huge — but this conversation needs to be had.

The limits of LLMs

Here’s a heretical view: these LLMs are not very good. They throw up results like what’s the best glue to put on pizza. Could this be a bubble that will burst in a couple of years, and the companies will maximise shareholder profit by turning off those expensive servers…?

Perhaps — but even if it is a bubble, they’ve already normalised taking content in this way. We need to address that at the very least.

If we know that AI is wrong 15% of the time, we have to persuade other people who don’t really understand data but want AI that this is a problem. You can’t have people making decisions based on this data — what if you have to go to court to defend that decision? What if an FOI request comes in that requites you to give the justification for a decision.

We’ve already seen examples of machine learning algorithms making terrible decisions.

People still underestimate how often they just get things wrong. And they’re bald boxes — we can’t see how they made the decision, we can only see the inputs and outputs from them. Do we want digital sociopaths living among us?