In the 1980s, Andrew Barto and Rich Sutton were considered eccentric devotees of an elegant idea but condemned to failure: make machines learn, such as humans and animals, from experience.
Decades later, with the technique of which they were pioneers, more and more decisive for modern artificial intelligence and programs such as Chatgpt, Barto and Sutton have been awarded the Turing Award, the highest honor in the field of computer science.
Barto, Professor Emeritus at the University of Massachusetts Amherst, and Sutton, a professor at the University of Alberta, were pioneers in a technique known as reinforcement learning, which consists in persuading a computer to perform tasks by experimenting combined with positive or negative feedback.
“When I started working on this, I wasn’t fashionable,” Barto recalls with a smile, speaking by Zoom from his home in Massachusetts. “It has been extraordinary that you have achieved some influence and attention,” he adds.
Reinforcement learning was perhaps the most famous used by Google Deepmind in 2016 to build Alphago, a program that learned for itself how to play the incredibly complex and subtle GO board game at an expert level. This demonstration aroused a new interest in the technique, which has been used in advertising, in the optimization of the energy use of data centers, finance and in chips design. The method also has a long history in robotics, where it can help machines learn to perform physical tasks for trial and error.
More recently, reinforcement learning has been crucial to guide the exit of large linguistic models (LLM) and produce extraordinarily capable chatbot programs. The same method is being used to train AI models that mimic human reasoning and to create more capable agents.
Sutton points out, however, that the methods used to guide LLMs imply that humans provide objectives instead of an algorithm learning purely through their own exploration. In your opinion, making machines learn for themselves can be more fruitful. “The great division is yes [la IA está] Learning from people or if you are learning from your own experience, “he says.
The work of Barto and Sutton “has been the axis of the progress of AI in recent decades,” says Jeff Dean, Senior Vice President of Google, in a statement from the Association for Computing Machinery (ACM), which the Turing prize annually grants. “The tools they developed are still a central pillar of the AI boom and have meant great advances.”
The reinforcement has a long and rugged story within AI. He was present at the dawn of this field, when Alan Turing suggested that machines could learn through experience and feedback in their famous 1950 article “Computing Machinery and Intelligence“, which examines the idea that a machine might think one day as a human being. Arthur Samuel, a pioneer of AI, used reinforcement learning to build in 1955 one of the first automatic learning programs, a system capable of playing ladies.
However, despite their first successes, reinforcement learning and work related to artificial neural networks fell out of favor and for years they were eclipsed by efforts to build AI using logical symbols and rules instead of learning from scratch.
However, Barto, Sutton and others persevered, inspired by biology and psychology work, such as the experiments carried out by Edward Thorndike in the early nineties, which showed that animal behavior is determined by stimuli. They were also inspired by neuroscience and control theory to develop algorithms that allow computers to imitate this type of learning.
#pioneers #reinforcement #learning #models #trained