AI: Ethics and implications for open data

It’s almost impossible to talk about data without talking about AI now. And in some contexts, the large language models that underlie Generative AI can be very impressive. One attendee had some personal finance data in Excel. He took a screenshot of it, popped it into ChatGPT, and asked it if he was getting better or worse at saving. It worked — and was right. It had extracted the numbers from the screenshot, written some Python code, and run that to create the plot.

Obviously, whenever you’re working with AI you need to check it — they do hallucinate. For most use cases, there’s a need for editors and checkers. You’re not necessarily replacing people, just allowing them to produce more by letting the GenAI to do the work, and checking it. One attendee always asks the AI to show him how it did the work.

To some degree, you need the same skills the AI was using to check it. So, is it actually helping people without these skills? Is this just an awkward stage the technology is going through? Will it ever end? There’s no understanding in these tools, they just give what they statistically predict is the correct answer based on what they’ve seen before.

The hidden ethical costs of AI

Remember that AI uses both electricity and water. To what extent is what we’re doing with them actually needed? There’s a climate cost. The biggest part of the cost comes from training, though, so the new race is towards the same quality of response from smaller training data, and thus smaller models. And smaller models mean you can fit them on phones.

Currently, these costs are partially being disguised by the fact that the companies aren’t — yet — passing those costs onto the consumer. And those emissions can never be clawed back. They’re out there now. And the climate cost gets worse with each new model.

The big models are stuck in time — at the moment they stop training it. Each time they come out with a new model, they have to train from scratch again. And it’s a big assumption that things get better — people are finding ways of preventing using the content being used for training, or even “poisoning” the data, so it hurts the training.

The data risks of AI

We need to make people more aware of the risks: everyone’s heard of AI, everyone wants to use it to make their lives easier. And so they upload, say, a legal document to get it summarised. What happens to that file then? Who knows?

The Excel example above was inherently anonymous. But if you’re going to upload data to ChatGPT, you need to strip out personally identifying information, if you’re going to stay ethical.

The example of Air Canada’s Ai chatbot giving a customer a discount that the company was legally obliged to honour is an example of how they way you tune an AI can have an impact. There’s been a rapid switch from chatbots as public-facing tools, to internal co-pilots instead, because of these legal risks.

Bias or oppression?

The world is biased — so training LLMs on general data makes them biased. It just exacerbates the existing bias in the data. Because of this, it’s very difficult to buy products off the shelf, without knowing how they were trained. Many companies can’t afford to buy black boxes, they can’t understand.

However, as one attendee pointed out, humans are black boxes even to themselves. We all make decisions every day based on biased data.

One attendee suggested that we should avoid the way “bias” — it’s actually a form of oppression. She recommends the book Data Feminism that explores this. People are trying to address these issues, but occasionally, they end up over-correcting. And it remains a persistent problem: the USA produces the most data in the world, so the models will lean towards US ways of being.

We need AI seatbelts

We’re in the phase where we’re driving cars without seatbelts. What will be the digital equivalent of the crashes that led to seatbelt legislation? It may already be happening: LLMs are bing used to produce misinformation and extremist content. And so the companies behind the LLMs are working to stop them being used for that. The machines have no sense of ethics, so will produce material statistically likely to be harmful if asked.

There are plenty of ways of producing harmful materials online. But AI accelerates and scales that — one person can rapidly produce vast volumes of propaganda.

But, to go back to the seatbelt example, they’re an open design anyone can use. If OpenAI found a way of vaccinating its AI against producing propaganda, it probably wouldn’t share that with the market. Will we end up with a situation like we have with the internet, where there’s a web and a dark web, for things illegal on the “main” web? AI and DarkAI…?

Leave a Reply

Your email address will not be published. Required fields are marked *