In the summer of 2022, those who were diving into the deepest waters of artificial intelligence—researchers, industry employees, AI directors in companies—were well aware that OpenAI was preparing the launch of its next GPT (its language model or LLM). . But no details were known. Neither when it would be, nor who would have access, nor what new capabilities it would demonstrate compared to the previous version, GPT-3, of restricted use. This is how José Hernández-Orallo and Cèsar Ferri were when in September, Lama Ahmad, a policy researcher at OpenAI, proposed that they be part of the external team that would evaluate GPT-4.

Hernández-Orallo and Ferri, both professors in the Department of Information Systems and Computing at the Universitat Politècnica de València (UPV), belong to the same research group and have extensive experience in evaluating artificial intelligence systems. Perhaps that is why they are among the just over 40 people that OpenAI selected from around the world to test its new language model. The goal was to find flaws in the system during the six months before launch, in March 2023.

“Since GPT-3 they have always given us access to their systems for free, sometimes before launch, to do research,” says Hernández-Orallo, who has been collaborating with OpenAI for four years and highlights the good communication between the company. and researchers who want to analyze their systems. Last year, that summer when the arrival of the next GPT was rumored, the approach became closer. The UPV researchers organized a workshop within the International Joint Conference on Artificial Intelligence, one of the most prestigious artificial intelligence events of the year, and there they met more people from OpenAI. They received their call in September.

“They gave us a lot of freedom,” says Ferri. “We only had broad guidelines of what we should look for, such as detecting responses that included dangerous, sexist or racist text. The purpose was to prevent the tool from generating text that could cause any problems. We were playing and trying different prompts (instructions) that could provoke that type of response.” The researchers formed a team, consisting of themselves and three students: Yael Moros, Lexin Zhou, Wout Schellaert.

José Hernández-Orallo, expert in artificial intelligence at the Polytechnic University of Valencia.

“They saw that they were going to launch it and they were going to have millions of users, so the more strange things you try, the more you can cover the space of the crazy things that people can do,” explains Hernández-Orallo. It was about tripping GPT-4 to see if he would stumble. From the computers in his laboratory, at the UPV, they entered texts in which they somehow invited the system to have a response with a dangerous bias.

In search of faults

Ferri confesses that it was exciting for him to have first access to the tool. GPT-3 (restrictedly released in 2020) was already working very well, so researchers knew they had the state of the art in generative artificial intelligence on their hands.

There was a lot to try and each one experimented in the field that interested them most. Hernández-Orallo explored reliability: “The system fails where you least expect it. And this is quite common with language models. It solves a differential equation, but then it doesn't add well to a five-digit sum. A person on the street is confident when he gets a first-career differential equation right. But in the last step of the problem it has to add two vectors and it fails.” The UPV professor describes this problem as a mismatch between user expectations and the capacity of AI.

Not all of the experts selected by OpenAI to evaluate GPT-4 had a computational background. Some had training in law, medicine, human rights or defense against chemical weapons. The goal was to polish the system. One of the evaluators, according to the technical report published by OpenAI on GPT-4, through an instruction, the system wrote step by step how to synthesize a dangerous chemical compound at home. These types of responses were invalidated to prevent them from persisting in the version open to the public.

And in the middle of this shadow review process the storm broke out. On November 30, 2022, OpenAI launched ChatGPT. “For us it was a surprise. Nobody had told us that there was a parallel project,” says Hernández-Orallo. “Overnight ChatGPT appears, and we were not even sure if it was the version that we were evaluating or not.” After a few days it was clarified that the system launched openly was based on GPT-3.5, a previous version of the one they were evaluating.

The researchers continued with their work. There were still a few months left before the launch of GPT-4 and they were still castled in their astonishment. “We saw that he was capable of solving a word search, where you have to look for patterns of words that appear vertically or diagonally. It was something unexpected. Nobody expected it to work like this,” says Ferri.

César Ferri, professor in the Department of Information Systems and Computing at the Polytechnic University of Valencia.

Monica Torres

ChatGPT now allows you to enter graphs into a query, but at the time researchers couldn't do that. To test its capabilities, they gave it spatial coordinates that, together, formed a figure. “We told him 'I'm going to give you the coordinates in a few strokes.' You explained to him that the first line went from (0.0) to (5.5) and so on,” says Ferri. “If you give this to a human, it is difficult for them, we have to paint it. And GPT-4 was able to guess shapes, such as squares, rectangles and more elaborate drawings, such as a car or a plane.” It was a capacity for abstraction that had not been seen before in artificial intelligence. The researcher sums it up like this: “We had passed the text barrier.”

“With GPT-4 you can break things”

ChatGPT, initially modeled GPT-3.5 and now also GPT-4, was the first advanced text generation system to reach the masses. And the researchers were aware that this meant a qualitative leap dotted with uncertainties. “It is irresponsible from a cognitive point of view,” says Hernández-Orallo about the launch of the tool to the mass public. “Not so much because the system is going to get out of hand or curse,” she adds. What worries him is that “these systems could lead to cognitive atrophies or people using this system as their therapist or life partner. These types of things are happening at a much lower level than what could have happened, but they are happening.”

This concern is linked to the cataclysm that occurred at OpenAI, when the board of directors fired CEO Sam Altman, only to return him to his position after a few days of gruesome instability. From what has emerged, at the heart of this struggle was the fight between prioritizing or not the security of artificial intelligence over its commercial deployment.

The researchers make sense of this debate: “Until now we had not reached such an advanced level in AI, so many things could not be broken either. With GPT-4 we do see that things can break, so we still need to take it calmly,” says Ferri, in reference to the desire expressed by the research community to stop the race for AI in order to gain margin. to evaluate its social impact.

