The human brain has a key property that makes language possible and allows us to develop sophisticated thoughts: compositional generalization. It is the ability to combine already known elements with others that have just been learned in a new way. For example, once a child knows how to jump, he understands perfectly what it means to jump with his hands up or with his eyes closed. In the 1980s, it was theorized that artificial neural networks, the engine that underpins artificial intelligence and machine learning, would be incapable of establishing these connections. An article published in the magazine Nature has shown that they can, which potentially opens up a great field for improvement in the discipline.
The authors of the study have developed an innovative training method, which they have named meta-learning for compositionality (meta-learning for compositionality, in the original in English), in which the neural network is constantly updated and directed through a series of episodes so that it is able to relate experiences. Subsequently, they carried out experiments with volunteers who were subjected to the same tests as the machines. The results show that the machine was able to generalize as well or better than people.
“For 35 years, researchers in cognitive science, artificial intelligence, linguistics, and philosophy have debated whether neural networks can achieve systematic human-like generalization. “We have proven for the first time that it is,” says Brenden Lake, associate professor at the Center for Data Science and the Department of Psychology at NYU and one of the authors of the work.
Large language models, such as ChatGPT, are capable of generating coherent and well-structured texts from the instructions given to them. The problem is that, before being able to do this, they have to be trained with a huge amount of data. That is, very extensive databases are processed and artificial intelligence or machine learning algorithms are developed (machine learning) that they are able to extract patterns and learn, for example, that there is a very high probability that the words “The grass is colored” will be followed by “green.”
These training processes are slow and very costly in terms of energy. Training a model like ChatGPT, which takes into account more than 175 billion parameters, requires a lot of computational power. That is, several data centers (industrial warehouses full of computers) running day and night for weeks or months.
“We propose a partial solution to this problem that is based on an idea from cognitive sciences,” explains Marco Baroni, ICREA researcher and professor at the Department of Translation and Social Languages at Pompeu Fabra University in Barcelona by phone, and co-author of the study. . “Humans can learn very quickly because we have the faculty of compositional generalization. That is to say, if I have never heard the phrase ‘jump twice’, but I do know what ‘jump’ is and what ‘twice’ is, I can understand it. ChatGPT is not capable of doing that,” says Baroni. OpenAI’s star tool has had to learn what it means to jump once, jump twice, sing once, sing twice…
The type of training proposed by Lake and Baroni can help large language models learn to generalize with less training data. The next step, Baroni says, is to prove that his experiment is scalable. They have already proven that it works in a laboratory context; Now it’s time to do it with a conversational model. “We don’t have access to ChatGPT, which is a proprietary product of OpenAI, but there are many smaller and very powerful models developed by academic centers. We will use one of them,” Baroni emphasizes.
One of the authors’ intentions is, in fact, to “democratize artificial intelligence.” The fact that large language models require enormous amounts of data and computing power limits the number of providers to a handful of companies with the necessary infrastructure: Microsoft, Google, Amazon, Meta, etc. If Lake and Baroni’s proposal proves its worth by training this type of models, it would open the door for more modest operators to develop their own systems and have nothing to envy of ChatGPT or Bard.
The advance presented by these two scientists can also be useful in other disciplines. “Brenan and I come from the field of linguistic psychology. We do not believe that machines think like human beings, but we do believe that understanding how machines work can tell us something about how humans do,” Baroni highlights. “In fact, we showed that when our system makes mistakes, the error is not as large as those of ChatGPT, but rather similar to those of people.”
This has happened, for example, with a failure related to iconicity, a phenomenon in linguistics present in all the languages of the world whereby if you say A and B, I leave the house and go to eat, that means that I leave the house first. and then I’m going to eat. “In experimental tasks, if you teach the human subject that, when saying A and B, the correct order is B and A, there are usually errors. Our system also makes that type of error,” illustrates the Italian researcher.
What path can the method devised by Lake and Baroni have? It will all depend on what happens when testing with large language models. “I couldn’t say if it is a line of research that is going to offer great advances in the short or medium term,” says Teodoro Calonge, Professor in the Department of Computer Science at the University of Valladolid, who has reviewed the code used in the experiments. And he adds, in statements to the SMC Spain platform: “Of course, I don’t think it will answer the questions that are currently being raised in the field of ‘explainability of artificial intelligence’ and, in particular, in the field of artificial intelligence.”
You can follow EL PAÍS Technology in Facebook and x or sign up here to receive our weekly newsletter.
#experiment #machine #relate #concepts #humans