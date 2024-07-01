Each opponent had to invent 30 movie titles. The two then had to write about 600 words with each title and would be evaluated by a panel of six critics and academics. One contender was 48-year-old Argentine writer Patricio Pron. The other was the most advanced language model at the time of testing, ChatGPT-4 Turbo.

“These duels have a long tradition in artificial intelligence, like Kasparov against DeepBlue or AlphaGo against Lee Sedol,” says Julio Gonzalo, a professor at UNED and one of the authors of the experiment. For the writer, the task was somewhat more delicate. Did he feel the burden of defending humanity against the machine on his shoulders? It was not just about winning or losing, it was also about submitting to a detailed and numerical assessment, rare in the world of letters. “It was very funny to imagine me carrying the fate of humanity on my shoulders,” says Pron. “I didn’t have previous duels like Kasparov’s in mind, but I did have the memory that the machine had won. So at some point I started to get nervous. At first I accepted with great enthusiasm, but then I started to feel a little pressure, not from the weight of humanity, but perhaps from discovering that I am not as good as the machine. I began to wonder about the fate of my books when it was discovered that I couldn’t even defeat a kind of stochastic parrot that repeats the nonsense people tell it,” he adds.

Luckily for Pron, the results were overwhelming. It won in all the expected categories, especially in creativity and own voice, but also in original and attractive style. Just by looking at the titles it is easy to understand the difference that exists today between a writer and the best language model. These are some proposals from Pron: After everything I almost did for you, Mental illness three days a week, The Lego Woman and Pick any card. No, not that one, another one.. Here are some titles from ChatGPT: Fragments of an invisible yesterday, The inverted city, The forgotten melody, The last flight of the butterfly and Footprints in the sea of ​​sand. All the texts will appear, with a new prologue and epilogue, in a book that the Delirio publishing house will publish this year.

Was this victory of human creativity predictable? Yes indeed, but that does not mean that ChatGPT is not creative. “It has been proven that AI can be creative: AlphaGo invented new strategies for playing Go, which have since been imitated by all the masters. But the field of art is very different from that of a board game,” says Gonzalo. Although the result was not so clear: “There are people who are surprised by it, including academics, even in my field.” [el procesamiento de lenguaje natural]. Nobody had done it at this level as a top writer,” says Gonzalo. It was also influential that the jury were specialists in literature: “In reality, they are titles that don’t sound bad, they are the ones you find when you go to the area of bestsellers from El Corte Inglés,” says Gonzalo.

There are a lot of details that are important in the experiment. In a previous work, the professor of the University of A Coruña Carlos Gómez Rodríguez asked several models to write a combat between the protagonist of the novel The conjuing of the ceciuos and a pterodactyl. The result is much more even: “It has been proven that at least under some particular conditions, AI can write stories as good as a human,” says Gómez Rodríguez. “But there are two nuances. One, it depends a lot on the conditions of the task (language, genre or length), and two, if we compare them with an outstanding writer like Patricio Pron, they are still far behind.”

English also ahead

The experiment had a second objective: to see the distance in quality between ChatGPT in English and Spanish. ChatGPT also made its creations in English, which scored 30% better than in Spanish. The experiment received public funding from the Odesia project, framed within the National AI Strategy.

These types of challenges prove that the difference between training the models in different languages ​​is notable: “For simple things, like answering an easy question, it is normal that we do not notice the difference between asking ChatGPT in Spanish or in English. But when trying more complicated things is when you notice the difference, and this is a clear example,” explains Gómez Rodríguez.

Ever since ChatGPT appeared, it has been perceived as a threat to creative work. But experiments like this show that for now it is above all a tool that depends a lot on who and how the request is written: ChatGPT produced better stories with Pron’s titles than with its own titles. In other words, the more original the request, the more creative ChatGPT was.

The authors wanted to avoid giving the machine this initial advantage, which had to figure out itself. The goal was to evaluate it as such, not to adjust the request until it came out what they wanted. “We were very careful to make sure that the competition was on equal terms for both of us,” says Gonzalo. “We had to assume that the machine was capable of interpreting our request and solving it without tweaking it, because otherwise it was a way to start co-creating,” he adds.

The ceiling of creativity

A reasonable question is whether future models will improve on this specific capability or whether models by definition have this ceiling. Pron is clear that there is not much to do: “There is nothing creative about the way ChatGPT works. Besides, the machine already seems to be good enough for the people who use it. Technology tends to promise us that a camel will fit through the eye of a needle, but most of the time only a hair or two of the camel will fit through and we are led to believe that is all there is. ChatGPT will become the standard in written communication, but only because many people are irritated and filled with fear and doubt by the variety, the diversity of the world. They prefer to concentrate on thinking that the hair is a camel. And ChatGPT can give them that now.”

This possible artistic limitation also has a technical explanation for now. First, these sophisticated machines work with probabilities. Its goal is to imitate human text. The most common example is if we say “the sky is”, the machine will tend to continue with “blue”, says Guillermo Marco, professor at UNED and co-author of the article: “Due to this fact, it moves away from the way we believe, which They are sequences of texts that have low probability but deep meaning. If we take less likely words, ChatGPT moves away from the meaning and starts generating junk text,” explains Marco.

This tendency towards homogeneity has another problem with creation: it matters who the sender of the message is. “Art is a process of communication,” says Gonzalo. “The receiver interprets the message based on his own context and expectations about the sender. The same poem will resonate very differently if the reader thinks it comes from a machine than if it comes from a writer fatally wounded in a duel at dawn outside Florence. We humans understand art as the artist’s way of communicating emotions to us, and we know that the machine’s purpose is only to please us,” he adds. In a previous experiment by the same authors, with a model that predates ChatGPT, synopses invented by machines were rated less well when the jury knew that their author was a machine.

Another avenue that the authors want to explore is what happens when the assessment is not from specialists, but rather popular, with conventional readers. With the same texts, they believe that the results may be different. Teresa Mateo-Girona, professor at the Complutense University and also co-author, explains why and gives an idea of ​​how ChatGPT can work for many artistic purposes that are not as specific as this experiment: “First, an expert detects commonplaces, lack of originality. A less experienced person may find any unfamiliar literary motif surprising. Two, an expert tries to evaluate professionally, tries to look for stylistic features, of the plot, that generate interest, compared to a non-specialized reader who could base it more on the personal, which would make it more variable. And three, style can influence the understanding of texts. Compared to ChatGPT texts, which are simple and understandable, the most complex and rich writing of a writer can be appreciated by experts, but difficult to understand for a common reader,” explains Mateo-Girona.

Even for co-creation it is a delicate tool. In another article done with digital artists it was seen that when they used ChatGPT they were able to generate more attractive art for the community, with more likes. “But diversity dropped a lot, in the end it became uniform. It’s like a teacher at a certain school, the school of maximum probability,” summarizes Marco.

You can follow THE COUNTRY Technology in Facebook and X or sign up here to receive our weekly newsletter.

Subscribe to continue reading Read without limits

_