Category Archives: Liveblog

Energy – is there data to model the future, and is it open?


Daniel Kenning, who said he works in “this thing called transition engineering”, which is about helping organisations, communities and individuals “to work towards a desirable future”, is particularly interested in energy, and fossil fuels, and alternatives.

He told the final session of day one that there is lots of information about energy use in the past. But very little about the future. So, he asked, “is it possible to take all that data from the past, and use it to create a map for the future?” At the moment, he warned, there is a belief that “we can carry on in the same direction” but we can’t, so can we map a safe way forward, that doesn’t just say “we need to do this because of the climate.”

Speakers pointed out this raises a profound question. Are we talking about energy use – in which case we’ll be doing modelling based on observable trends in, say, people’s acquisition of the use of smartphones, or companies driving AI, to work out how much energy might be needed in the future. Or about energy supply – in which case we’ll be trying to work out what we will be able to use safely, allowing for issues like we could meet demand by burning coal, but we don’t want to do that.

Working around Sturgeon’s Law

Daniel said the latter. But, of course, it’s hard. One speaker suggested that one reason is that immensly complex systems like the interaction between energy availability and use will be subject to Sturgeon’s Law (90% of things are crap).

So, we try and make the computing system more efficient. But then something like BitCoin comes along. Or a lot of effort will go into things like hydrogen. Only for solar panels to get good enough to power underground stations.

Fine, Daniel agreed, but at the moment, governments are talking about ‘net zero’ and how consumers and suppliers can get to it. When what they really need to think about is how reliant they are on energy, and how they could cope if it becomes hugely expensive or not available.

As an example, he said he goes to two barbers: one relies on overhead lights and uses power clippers, and one has a big window and scissors. One can carry on during a power cut and is much more sustainable than the other. “We need to be brave, and say what would happen if we couldn’t do business as usual? How would we carry on?”

Or, to put it another way, instead of saying we do this, and we need to use less energy while doing it, we need to say we have this amount of energy, what are we going to do with it?

Fighting the market

Speakers were sceptical, given the role of market forces in driving demand. They suggested that only excessive costs would create change.

However, Daniel argued that market forces could still have a role,  and a different role, by providing an incentive for companies to work out how to do things more efficiently in line with the direction of travel, which would give them a competitive advantage, over companies that failed to adapt.

The data is there – but it’s not open

Another participant said that if the question is where is the data to wrestle with these questions, the energy crisis of a couple of years ago could provide a guide. At the time, she said, she had a client who asked for some future modelling of price setting. “And we had to say the data was not available, because it is commercially collected, but it is closed.

“Producers have forecasts, to future proof their businesses, but it is not available to us.” So the first thing to do, is to ask for the data to be made open. “The question is not: can we do it? But can we get the data that will allow us to do it.”

Managing multilingual datasets: what you need to consider

More people than you might think have to deal with data in multiple languages. Work in Scotland? English and Gaelic are on your agenda. You’ve built a database to support English — and then it struggles when someone has an accent in their name. Transliterations from other scripts are often not consistent.

Here is one group’s workshopped list of the key issues people need to consider.

Technical considerations

Encoding standards

Unicode / UTF8 — you can say every character is a number — but some characters are more than a letter: accents, ligatures. It’s even more pronounced in Arabic and Chinese. It really matters if you want to make them consistently searchable.

These are complicated, cursed and political. There’s an American evangelical organisation that gets to decide on language standards…

Text direction

Right to left, left to right, top to bottom, and more…

Orthography

Same language, different alphabet

Cursive text

It can be hard for humans to read easily — let alone machines.

Human factors

Language structure

  • What needs to be captured
  • Names
  • Constructed languages (from Esperanto to Elvish to Klingon…)

Some languages are so different from one another, it’s challenging to make direct translations. It’s difficult to find matching elements of a sentence. The verb might not be what gives you a sense of when something is happening in time. Mandarin has no tenses or singular or plural.

Audio tones

The way people say things can have an impact on meaning

Code-switching

Changing the way you speak, your vocabulary or so on, to fit in with a group you perceive as more dominant in any social situation.

Language shifts

Language changes over time, the semantic meaning of words drifts.

Closed language practices

Some communities use a language to make sure out-group people don’t understand them, and will change elements that are discovered.

Variant distinctions

Do you fancy eggplant or aubergine for dinner? It depends on where you are…

Slang

Informal usage, which again tends to shift over time.

Sociolects

Variants by social groups, rather than Dialects, which are variants by location.

Non-verbal/non-written language

Lingua Franca Mix

Mixing up words from different languages to create distinct language variants

What is a language?

For some people, it’s an ISO 639 number, for others it’s something people speak. It depends on where and how you need to draw a line.


Time to deliver open addresses?

The case for open addresses has been made at successive Open Data Camps. “What we are talking about is the UK national address dataset,” said session leader Owen Boswarva.

“Going back to the early days of open data, this is one of the data sets that people have argued for. It is available in other countries, like France, for example. But it has been a hard nut to crack in the UK, where we have not got over the hurdle of political will.”

What’s the issue?

So, what are open addresses? Owen started with a quick primer. “We are talking about non-personal data,” he said. “Addresses, and things like the post code, which are assigned by Royal Mail, and point co-ordinates. But not the name of the person who lives there, or anything like that.

“At the moment, addresses are created by local authorities, when a property is being planned, and this information is fed into a national database, run by GeoPlace, which is co-owned by Ordance Survey and an organisation that represents local authorities. It’s made available in various ways… such as Address Base, which is used by local authorities, and the Post Code Address file, which is used by the Royal Mail to deliver mail.

“These products are available to buy for some purposes. The open data community have argued they should be freely available. But as I said at the start, there has been a lack of political will to make it happen.”

One participant asked how much money the owners of the data make from it. Owen said the funding and revenue involved is opaque, but might be several million pounds a year. However, he argued, the data sets would not vanish if their owners could no longer generate this revenue.

Going around the houses to avoid the lack of data

There is some debate within the open source community over whether there are alternatives to solving the national problem.  One participant argued that it is possible to crowd source the information.

Arguably, Open Street Map does this. In fact, another participant said he had worked on a project to get people living on a new development that wasn’t being served by Royal Mail to register their data on Open Street Map – and get their parcels delivered.

It’s also possible to triangulate public data sets with location data sets. This is done by councils, when they want to check and use health and other data. However, there are some surprising restrictions on what can be done, to avoid undermining the proprietary data owned by councils and Royal Mail. Photos of things like parcels that reveal their location can be used by individuals and companies to prove delivery, but not to build a national resource.

An idea for a new government looking for new ideas

Also, Owen argued that crowd sourcing will not create an authoritative, national data file. Instead, he and other speakers felt the way forward will be to engage the new government in the idea that opening up address data will deliver benefits.

“I am in favour of this in principle,” Owen said. “But we know that organisations and businesses across the UK own their own data sets, that they have collected from people who have made enquiries or bought things from them, but to cleanse that, they [need to compare the data with the national file]. So, it would help to improve all these small data sets.

“There is another data quality piece, which is that people go onto websites and look for their address, and find it is not there, or not right, because it has been collected from different sources.” Other speakers argued open address data should generate growth benefits. As one example, a speaker argued that the UK wants to be a leader in drone technology, it will need a file of where drones might go.

There is also concern that if the Royal Mail is sold to a Czech billionaire, a valuable national resource will become owned by a foreign operator. Might this prompt action from the government? It might, Owen argued, if ministers are aware of it. Clearly, the open data community need to make sure they are. “The people who we need to influence are never in these discussions,” Owen acknowledged. “But I hope that we are all more aware, and can bring that awareness to bear.”


AI: Ethics and implications for open data

It’s almost impossible to talk about data without talking about AI now. And in some contexts, the large language models that underlie Generative AI can be very impressive. One attendee had some personal finance data in Excel. He took a screenshot of it, popped it into ChatGPT, and asked it if he was getting better or worse at saving. It worked — and was right. It had extracted the numbers from the screenshot, written some Python code, and run that to create the plot.

Obviously, whenever you’re working with AI you need to check it — they do hallucinate. For most use cases, there’s a need for editors and checkers. You’re not necessarily replacing people, just allowing them to produce more by letting the GenAI to do the work, and checking it. One attendee always asks the AI to show him how it did the work.

To some degree, you need the same skills the AI was using to check it. So, is it actually helping people without these skills? Is this just an awkward stage the technology is going through? Will it ever end? There’s no understanding in these tools, they just give what they statistically predict is the correct answer based on what they’ve seen before.

The hidden ethical costs of AI

Remember that AI uses both electricity and water. To what extent is what we’re doing with them actually needed? There’s a climate cost. The biggest part of the cost comes from training, though, so the new race is towards the same quality of response from smaller training data, and thus smaller models. And smaller models mean you can fit them on phones.

Currently, these costs are partially being disguised by the fact that the companies aren’t — yet — passing those costs onto the consumer. And those emissions can never be clawed back. They’re out there now. And the climate cost gets worse with each new model.

The big models are stuck in time — at the moment they stop training it. Each time they come out with a new model, they have to train from scratch again. And it’s a big assumption that things get better — people are finding ways of preventing using the content being used for training, or even “poisoning” the data, so it hurts the training.

The data risks of AI

We need to make people more aware of the risks: everyone’s heard of AI, everyone wants to use it to make their lives easier. And so they upload, say, a legal document to get it summarised. What happens to that file then? Who knows?

The Excel example above was inherently anonymous. But if you’re going to upload data to ChatGPT, you need to strip out personally identifying information, if you’re going to stay ethical.

The example of Air Canada’s Ai chatbot giving a customer a discount that the company was legally obliged to honour is an example of how they way you tune an AI can have an impact. There’s been a rapid switch from chatbots as public-facing tools, to internal co-pilots instead, because of these legal risks.

Bias or oppression?

The world is biased — so training LLMs on general data makes them biased. It just exacerbates the existing bias in the data. Because of this, it’s very difficult to buy products off the shelf, without knowing how they were trained. Many companies can’t afford to buy black boxes, they can’t understand.

However, as one attendee pointed out, humans are black boxes even to themselves. We all make decisions every day based on biased data.

One attendee suggested that we should avoid the way “bias” — it’s actually a form of oppression. She recommends the book Data Feminism that explores this. People are trying to address these issues, but occasionally, they end up over-correcting. And it remains a persistent problem: the USA produces the most data in the world, so the models will lean towards US ways of being.

We need AI seatbelts

We’re in the phase where we’re driving cars without seatbelts. What will be the digital equivalent of the crashes that led to seatbelt legislation? It may already be happening: LLMs are bing used to produce misinformation and extremist content. And so the companies behind the LLMs are working to stop them being used for that. The machines have no sense of ethics, so will produce material statistically likely to be harmful if asked.

There are plenty of ways of producing harmful materials online. But AI accelerates and scales that — one person can rapidly produce vast volumes of propaganda.

But, to go back to the seatbelt example, they’re an open design anyone can use. If OpenAI found a way of vaccinating its AI against producing propaganda, it probably wouldn’t share that with the market. Will we end up with a situation like we have with the internet, where there’s a web and a dark web, for things illegal on the “main” web? AI and DarkAI…?

Open data, communities, and the role of gamification


After lunch on day one, there was a discussion of how open data can work for communities, and people who are not data people, led by Pauline Roche from Digital WM Productions and Sam Milsom from Open Data Manchester.

Pauline explained that her company works with communities, and wants them to be able to say why data is important. While Sam said his organisation runs a whole programme for communities to help people find information that matters to them. “I am interested to have a conversation about what people want and what is out there,” he said. “Because I think there is a lot of information available that should be of interest to local communities.”

Data changes the questions people ask

Sam talked about some of the projects he has worked on. One involved teaching local councillors to use publicly available data sources, such as those published by the ONS. “I Ioved hearing the stories,” he said. “We had one councillor saying he didn’t realise how deprived his ward was, because it contained a few big houses. While another was outraged by the levels of bike-theft he uncovered.”

Another involved teaching local people to do traffic counts. This often started with a perception that traffic was terrible. The counts might show it wasn’t, but it was fast. “Having the data could change the kind of questions that they asked.”

What else had people done? Sam asked. What had they found useful? What techniques could be used to get people to engage? Pauline said one of the things her company did was to work with communities on wikipedia entries. “It sounds really basic, but it’s important to look at what is being said, and to tell people how they can change that.”

Getting and maintaining parcipation: tips and tricks from leaderboards to badges

Terence Eden, who runs the Open Benches project to enable people to record memorial benches, said a powerful driver was a “leaderboard” – to let people could see how many posts they had made, or how many photos they had uploaded. “Creating friendly competition was a really good thing to do.”

This led to a lively conversation on the role of badges, rewards and other forms of gamification. Although Sam wondered if there were any dangers. For example, he said Open Data Manchester has done a lot of work to find out the routes that people take. The council wanted to know about routes to school, and how traffic affects them. They used tap technology to collect data – which was fun, and encouraged kids to interact with data – but ran the danger of distorting the information, because the kids would go out of their way to get more taps.

There was also concern that these techniques cost time and money, and can be hard to maintain over time. “We often find that there is a really useful data set that was developed eight years ago, and has not been developed since,” one speaker said. Pauline agreed that resourcing was a problem. “We have never had the resourcing that the private and public sector have had, and that makes me sad.”

Is experience enough?

Some projects generate at least some of their own funds. Linda Humphries said she had bought a t-shirt to support Open Benches. She also said that she was motivated to fill in gaps in its database: and used this as a reason to go for walks.

Which enchanted another speaker, who felt that “giving people a lovely experience” was a great reason and reward for getting them involved in data projects. Jez Nicholson said Open Street Map started with this kind of ethos. Although another speaker pointed out that this could also distort data, because only a “small bunch of people” with a particular interest might get involved (Open Street Map might not be great for capturing information about deprivation, he suggested).

Sam said there was some interest in government in providing direct rewards to people to widen participation. For example, he asked the group whether they thought that giving people an incentive to walk places, like money off their council tax, would work.

A further participant suggested the open data community should be doing more to get young people involved. Many would be much more interested in learning about open data and what could be done with it than traditional maths, she suggested.

Pauline agreed that she was “all for open data being fun.” Other rewards and incentives for taking part in community open data projects can clearly work – but need careful thought.


Data sources – how to find them

Alex McCutcheon – Valtech.
Gozde Karahan – PhD student, Turkey

Gozde explained that as part of her PhD she has been looking for data. In Turkey, and some other countries, it is easy. In the UK, not so much.

Alex explained that he had worked on a project for Glasgow Chamber of Commerce, which after the Covid-19 pandemic, wanted to know how well and how fast the city was recovering.

He said it was able to pull together a lot of data on traffic, and the number of buses running, and footfall in the city centre. But it wasn’t easy. “It was search the web using Google to find data sources and work out what was reliable,” he said. “It was very time-consuming, and I think it should be easier to find out what there is to use, and where it is, and what format it is in.”

LGA Inform Plus – a good starting point?

Martin Howitt said the Local Government Association has a data portal – LG Inform Plus – which is semi-open, in that it can be searched and a certain amount of data can be downloaded for free via an API.

However, he acknolwedged there are limitations to it. It doesn’t go down to address level, for example. So it’s not possible to enter an address and find out information like: what air quality is like, or how many trees there are, or what the index of multiple deprivation looks like at that level.

Gozde asked where all the data comes from. Martin said different places: the census, central government, DEFRA, the Department of Health, Public Health England. “There are 20,000 data sources, and there are issues, but it seems like a pretty good starting point,” he said. However, Alex asked how researchers like Gozde would find it. And Martin admitted that it was set up for people working in local authorities, who are LGA members, so it’s not super-easy, although it is accessible through web search.

The only constant is change (and that’s a problem)

Another researcher pointed out that even on portals, data formats can be very different and data sets can be out of date. He said he is working on a portal for Oxford, which also wants to give residents information about things like traffic and footfall, but “some [government] data sets are from 2022 or 2023” – and the way they are coded can change over time.

The project has been exploring other sources, such as Strava information, although it has limitations, in that it’s used by a self-selecting group. Alex said Glasgow was lucky that it had already invested in censors around the city to capture movement information. Although Martin said this wasn’t a complete solution – “there’s a surprisingly high attrition rate, as people drive bicycles into them, and all kinds of things.”

And the Oxford researcher said there can be issues with using this kind of information on a real-time portal, because some censors will report constantly, and some will only update information occasionally.

Alex and other speakers said changes to data are also a significant problem. “Often,” one speaker said, “there are good reasons for the changes” but it can still be a major piece of engineering work to accommodate them. At the very least, she suggested, data publishers could inform people properly that a change is coming. “Often, something just goes out on Twitter.”

Martin said LGA Inform Plus people to register; which makes it easier to communicate when something significant, “like boundary changes, which happen every time there’s a change of government’, comes along.

Is there a solution?

Towards the end of the session, Alex asked campers where they went for data. And like the people who had already spoken, most said they spent a lot of time on searchers, going to individual departments, and finding that “some are good” and some aren’t. The ONS has tried to create an integrated data platform, but given the dis-aggregated nature of data collection and publication in the UK, it’s been a long, tough job.

Plus: it’s still hard to make it work for everyone from data scientists to occasional users, such as journalists or people who just want to know something about their area. The government has tried running “bootcamps” to train more people as data scientists: but not everyone wants to be a data scientist; and some people might not need to be, if the data owners thought more carefully about their data and the costs and benefits of publishing it in accessible formats.

Alex joked that a data scientist is like the driver of an F1 car: they rely on a huge team of managers and mechanics. And in some ways, it’s the managers and mechanics who are keen to open data success.

Open Data and the new Labour Government

The day before Open Data Camp 9, the UK government changed. Kier Starmer’s Labour Party swept to power in a landslide. So, what’s the new government’s attitude to open data going to be — and what do we think they should do?

Some people did a search of the Labour manifesto for “data”. It’s mentioned five times, three of them in the same paragraph. They do mention a “data library” — what will that be, and how will it help? The likelihood is that the people planning Labour’s first 100 days in power know very, very little about data — it’s our responsibility to up-skill them.

Maybe they should ban PDFs…

Dashboards have become a dirty word, but they’re very good at showing how often data is being updated. We need them to publish data about the data, so we can see which departments are publishing data too slowly to be useful.

Return of the Data Protection and Digital Information Bill

The baseline will be what was in the Data Protection and Digital Information Bills, which failed to make it into law before the election. Many of the smart data provision in there had value, and Labour might think along the same lines. It encourages business to share data, and those same powers could be used to unlock open data.

Some of the contents of the bill were ideologically motivated, though, and many people present would be happy to see those elements dropped.

The Open Data Institute launched a manifesto before the election. It lists some small differences that could have a big impact. (The manifesto can be downloaded… as a PDF.)

How to sell open data to a new government

Decision-making on data is very centralised, and driven from within the London bubble. It needs to be more attended to regional capability and needs.

Data might be a route to building more trust between the electorate the politicians again. Was to recruit thousands of new teachers? Put that on a dashboard, and let us see how you’re doing. The people in this room could be critical to conversations like this.

Some data has to be opt in, though, like genetic data or medical data. Do we require a public information campaign on this? Data literacy in the country — the world — as a whole is incredibly poor. How can we expect the government to make a good decision about data when they don’t understand it themselves? We need data literacy embedded in education generally — but that’s a long-term objective. A basic data literacy education might be as important as computer literacy has become. Data ethics, too.

There’s a real dearth of information about how open data is actually improving people’s lives.

Where should responsibility for open data live?

What we need is a name: an MP or, better, a minister responsible. Unless it’s on someone’s plate, it’s not going to be done. There is one in Canada, there is one in France, we used to have one in the UK. We need to have the name of the minister who is responsible for data right across the departments.

There was an open data white paper in 2012, with a minister Francis Maude with Matt Hancock, but it got undermined by the treasury. We need to tie the use of open data to economic growth, or they won’t listen to us. Does your AI start-up need open data? Tell them! Open data isn’t cool any more, but AI is!

Reuse of public data sits with the Cabinet Office — does that make sense? Probably not. But equally, policy is being driven by technologists, and that’s probably not ideal as well.

Policy needs to sit alongside operational implementation — it can’t be abstract. Writing to your MP can be powerful — but especially if you can explain to them how it will help your business or help economic growth.

Open data as infrastructure

Open data should be part of Kier Starmer’s commitment to infrastructure. Previous efforts to share data between departments has not gone well. To fix that, we need the minister, and you need collective responsibility to pay for it. And there needs to be a cultural commitment to it across government.

Right now, the three biggest property datasets are not open. It would take a massive commitment to open them up, but it would make a massive difference. Should the utilities companies pay for Ordnance Survey, rather than commercial use of closed data? Those who modify the data should pay for it being updated and opened up, and the utilities are the obvious example here — they’d barely notice it.

A controversial topic: national ID cards. Does the immigration issue being so hot open the door to do it? Could that solve some data problems?

100 days to get open data on the agenda

The new MP for Kensington and Bayswater, Joe Powell, has a history of with open data, via Open Government. Should we be talking to him?

At the moment, sharing between departments is a complex process of memorandums of understand and agreements. It’s not a neat process. It’s not a fast process. Not only that, but it’s custom every time. Open data is just published. It’s so much simpler. It solves so many problems.

We need a continual push on data standards, so the data can flow more freely both in and outside government, and we can identify and solve problems.

The first 100 days are likely to be quite topsy-turvy. Take the opportunity of that chaos to get open data on the agenda. But don’t rely on a single MP. Liz Truss was a champion of it at one point…


Open Data 101: Open Day for Newbies (2024 edition)

What is open data?

Data that’s not private and closed — it’s published in some way. It has to be accessible to the general public. It’s open to anyone.

One definition:

Open data is data that can be freely used, reused and redistributed by anyone, subject at most only to the requirement to attribute and share alike.

Open data licensing

It needs to be published under an Open Data Licence. You, if you’re able, publish it with that licence applied to it. Another common one in the UK is the Open Government Licence — the OGL. It is very free, and doesn’t require attribution. There are others, including Creative Commons.

The “free” when applied to open data is akin to the “free” in “free speech”. Someone may have paid to produce, publish and share the data — but usually, it can be used for no or marginal cost.

Commercial organisations can sell products built around the data — that’s not the same thing as selling the data. For example, they can make it much more searchable for the average user, or make connections between elements of the data. If they add value, they can charge for it — but it is a fine line.

Beware, though: some licences restrict or ban commercial use.

Getting open data

It can be as simple as a file you download from a website, like data.gov.uk. Sometimes it will be a static file in the CSV format — a non-proprietary format for data.

Some pre-digital data has been digitised and made example, but far from all. Digitising that data allows us to analyse it more easily — especially really large data sets — using software. However, occasionally, you will end up with more data than your hardware can handle. There’s a movement now to allow access to data through a tool or API, to get just the data you are looking for. APIs can be quite intimidating, though.

Some forms of data are more available than other — there is lots and lots of geospatial data available, for example. Of course, that’s in file formats more suited to geospatial work.

APIs

An API — Application Program Interface — is code on a data store, which allows you to request data from it programmatically using an agreed language. It provides the advantage of being able to offer data in real time — for example, weather data is better in real time. General election data is another recent example.

Open Street Map is an international community of volunteers who build a constantly updated open map at street level. It’s a very good, mature and robust set of data. The community take it very seriously — and it offered through an API.

Using Open Data

There are many tools you can use to work with open data. It very much depends on what you want to do:

  • find an answer to a question
  • build a product that allows people to ask questions
  • present it in a way that intrigues people.

Some tools

Coding languages:

  • Python is a coding language that’s great for working with data
  • R is another language. People from a statistic background tend to prefer R.

Learn one or the other, but not both at the same time.

Software:

  • There’s Power BI from Microsoft, which is very powerful.
  • Tableau you can use for free, and it does something very similar to Power BI.

At some point, if you want to make something useful for others, you’ll need to learn Javascript.

There are plenty of helpful tutorials on the internet to help you, as well as some useful books.

Community support

Sadly, the sorts of open data community leader roles that used to exist in local government are disappearing, because of the financial crisis in local government. Generally, it now falls within the remit of the GIS — geographical information systems — team, which are still funded because they’re so necessary.

The push for open data originally came from central government, back in 2010. But it’s not just about the government, but for any community who might benefit from making data available.

Book: Open Data for Everybody

A useful book by Nathan Coyle.


Open data for health and early years

Matt Thompson – UK Health Security Agency
Mor Rubinstein – freelance consultant, London

Matt – “I am coming from the place of being a data provider for the health sector. And I want to know what we can do to do more to get health data out there, and what questions could you answer if you had more data. What good could you do?”
Mor – “I am coming from the demand side, as a data scientist and mum. If I want to know what early years services are out there, and whether they have places, and whether they do special needs, can I get that? And the answer is no.”

So, the synergy: how we put data out there, and make sure people can find and use it?

Who decides what is interesting?

An open data camper argued that, the moment, the government publishes data, and it decides what is interesting. So, the question is how to work out what users want, and then how to publish that, without, as she put it, “swamping” users.

Mor agreed that finding out what people want is key. “The question of how we do user research on data is an issue, and I don’t think it is something we do well,” she said. “And when it comes to access, to people can use it in an easy way, I don’t think it is done well.”
Another speaker argued there are some good examples of this being done. But publication is a challenge: most users can’t be expected to interograte databases, so a lot of information that is published is put out in visual formats, that can’t be interrogated.

A speaker from Manchester said the council is getting much better, because citizens have asked for data on issues like traffic. “They were getting asked for it, and they had some data, so they put it out as open data.”

So, he argued, “that bottom-up” approach is critical. Mor agreed, but suggested that sometimes public bodies could just think like their users.
Ofsted, she said, conducts a big and expensive census of early years providers every couple of years that is used internally. “But they don’t think perhaps mums would like this information.”

And what’s commercially viable?

The session moved on to discuss the economics of open data publishing. A speaker pointed out that it would once have been difficult to find out information about the facilities offered by hotels. But now, big, commercial aggregator sites have solved this: initially by scraping individual hotel websites.

On the other hand, they have the economy of scale to do it, and to keep the information up to date. These kinds of incentives aren’t always available to public bodies.
Although speakers argued that government and councils will be much more likely to publish information if they see a clear benefit.

Mor argued there were clear use cases in health as well as early years. “I used to work for Parkinson’s UK, and they used to run a census of people with Parkinson’s in the UK,” she said.

But health authorities already have this information. They just don’t publish it – or make it available through sites like openSAFETY.  If they did, it would “save a lot of money” for charities that they could spend on other things.

Unfortunately, a further speaker from Manchester said its local health trust had just invested millions in an electronic patient record system – but it doesn’t interoperate with other trust IT systems, or the IT used by other trusts.

So collecting information at a national level is far from straightforward. Industry standards would help a lot. However, Mor, Matt, and others pointed out that successive strategies have made the case for interoperability and using platforms like openEHR, but little has changed.

Also, the trust speaker argued, there are other drivers that might lead the NHS away from open data, as it realises the value of the data that it owns. The council speaker said he didn’t want to sound “too doomy” but the NHS is working with a big, US company on a federated data platform to make use of its data – and it’s certainly not open.

In short, there are practical, and commercial challenges to getting public bodies to publish open data. Perhaps the only way forward is to keep asking – and to be ready to demonstrate the commercial or other benefit for doing it. Mor – “As users and providers we need to come together and push.”


ODCamp 9: The Pitches (Day One)

If you’re new to unconferences, you might have been puzzled by the lack of a schedule. Well, that’s because the schedule is determined on the day, via the pitches people make. And those pitches end up as an insight of exactly what the open data community are thinking about each year.

Here’s what was on people’s minds in 2024:

The pitches


Some sessions have links to liveblog posts about what was discussed.

One for tomorrow

  • Joy Diversions — there will be one tomorrow morning for people who want fresh air and to explore for a while. Bring comfy shoes and a coat.