Data for trustworthy AI

A conversation on a big topic:

Bill Roberts (@billroberts): This occurred to me because I have  been reading a book called Made by Humans by Ellen Broad. People think that AI is magic, but it is created by algorithms that are written by humans, which means that sometimes it works really well and sometimes it does not. I wanted to discuss how to choose data for particular uses and how to interrogate biases. Also, how data publishers can address some of these issues. AI is second only to blockchain in the hype cycle at the moment. So how do we make sure that we make good choices about something that might have a big effect on people’s lives?

Hidden biases

Participant x: The data that goes into AI is like any other piece of data. I think the best thing to do is to publish your learning set: what are you teaching it?

xx: We all have biases and we think all we can do to tackle them is to publish the algorithms. So people in a niche can step in and say: ‘This is not working for me’. The only way is to be open about the algorithm.

xxx: Amazon had this problem: had an AI to pick candidates, and biased towards men… because historically the ‘best’ employees were men. So danger of AI is that it replicates biases already out in the ‘real’ world.

Bill Roberts: What these issues tend to expose is that you think there is an unbiased data set, because you take out information about people being ‘males’ and so on, but there are other indicators in there that drive you in the same direction.

xxxx: Another good example is a justice AI that tried to decide whether somebody was likely to reoffend. It suggested that colour was a factor; and that was because it included postcode.

Is being open the answer?

Bill Roberts: One of the things that Broad talks about in her book is being able to replicate the methods used. Which sounds good, but there may not be enough method in data science to do that. We don’t know enough about how something will react to its inputs; knowing what is happening between input and output is not well understood. We need to be able to get into that to review it.

xx: Having the data will allow us to do that and improve. Everybody needs to be able to challenge the data, based on their understanding and their needs. Then we can see how data science is helping the whole society.

Bill Roberts: This has also made me think about normal data for decision making. GDP figures are used in all kinds of influential decision making. But when you read them, you think ‘my god’. People think that 3% GDP is absolute truth, but it isn’t. It’s an estimate based on a lot of choices.

xxxxxx: Some of this is going to depend on education. Do people know how AI affects their bank balance or job prospects. If they go on Experian, do they know it will affect their credit rating?

Bill Roberts: If you are going to have a right to challenge, you need to understand what you might need to challenge.

xxxxxxx: The Good news is GDP has a right to explanation in it. French law has a tougher version: if you are affected by a decision, you can always ask for the basis of that decision, and that includes the algorithm and the data. so there is some good practice on this. In the UK, we have a subject access request. That doesn’t apply so much to private companies, but in the US, you have the right to challenge your credit rating.

xxxxxxxx: I think people think AI means the robots are coming for their jobs. Whereas, AI in the form of machine learning, is already pervasive in quite a boring way that has a big impact on them. I think there needs to be much more understanding of that.

xxxxxxx: Projects by IF have done some work on design. So, if you go on Experian and you get offered the chance to enter your Nectar card and get a discount, should it be telling you that information will be used for other decisions. Telling people that or not is a design issue.

A vital conversation to have now

xxxxxxxx: I think it is great to be having this conversation now. Because whether or not legislation is the right way forward, it is a big issue that needs to be thought about.

xxxxxxxxx: I should mention the DCMS’ data ethics framework, which is an attempt to build some ethics into project like this. There are seven steps that encourage people to think through what they have done. It says if you are transparent and have taken bias into account you are likely to be doing better. So it is an attempt to address these questions. But this was for the use of data. So the question is whether we need something specific for AI, or whether we need to talk about data in general.

Bill Roberts: One of the problems with that is that people who are already ethical will tend to use that, whereas people who are not so inclined to work in that way are not.

xxx: Which is why we need to understand that AI cannot solve our problems. We shouldn’t use it to tell us how to live our lives. It should support us in doing that.

xxxxxxx: I hate the term AI, because it makes it sound like magic. I prefer machine classification. I think that makes it much easier to understand what it is.

Bill Roberts: People do think it is magic. And the movies don’t help. Really, it is fancy pattern recognition. It cannot predict the future, unless things continue to be the way they have been in the past. Things are likely to come up.

xxxxxxx: The problem is that this is moving so fast that in the next year AI will progress far faster than people can understand.

xxxxxxxxx: But I think we have been quite negative. There are lots of examples of AI having huge benefits. There are massive upsides to this. We know of a lot of online gambling companies that are shutting off problem gamblers earlier than they used to because of machine learning. Water companies not cutting people off because they can find out that they have problems.

Bill Roberts: So cheering some good examples and some good practice is part of this as well.

[Session Notes]