Alex Ivanov, a data scientist from Faculty, wanted to talk about some of the technology that has been making waves in the press recently.
Usefully, he started by defining a few terms. “LLMs are a subset of AI models,” he said. “They are trained on vast amounts of text data and they can learn the intricacies of human language to do things like answer questions or search databases. At heart, they are trained to predict the next piece of text.
“Generative AI is a wider thing that can create things that are new, including text, and images, and even drugs: they are very broad. So, in any AI, we are talking about a machine learning from data. And the main difference between normal AI and generative AI is the output.
“In traditional AI, we focus on data and classification, to predict things like whether someone will develop diabetes, or even house prices. Wheras with generative AI we create data that was not there already.
“Where open data comes in is that these models are often trained on big datasets, so it can provide the raw material. However, there are certain challenges. One is data quality. If you just pick up lots of data without thinking about its quality that can cause problems.
“Then, there is privacy. Most open data doesn’t identify individuals, but there are some cases where that can happen. You need standardisation to bring all these sources together. Scaleability can be an issue. There are legal issues.
“And we need to think about transparency: some of these AIs are like black boxes, their outputs are almost like magic, so we need to understand what kind of output they are likely to have, and what impact that is likely to make.
“So, I’d like to think about how open data works in this context, and how we address some of these issues around transparency and bias.”
Continue reading Generative AI, large language models and open data