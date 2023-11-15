A new algorithm called Student of Games (in English, game student) is capable of winning different table games, such as chess, Go, Texas Hold’em poker and Scotland Yard, a strategy game. The artificial intelligence program combines guided search, machine learning and game theoretical reasoning, as explained by the researchers who developed it in the research published this Wednesday in the magazine Science Advances. Until now, the AlphaZero algorithm could only solve games with perfect information, such as chess and Go, in which all players have access to the same information. However, he was not able to win at poker as it was a game with imperfect information where the opponents’ cards were not known.

The research was carried out while the experts were working at Google DeepMind, Google’s artificial intelligence research division. However, several team members left Google in January 2022, and the company later laid off the majority of the remaining team in January 2023.

The tool is capable of winning in perfect and imperfect games with minimal knowledge. “Our algorithm is capable of reasoning based on the rules of the games. For example, learn to play all of them (chess, poker, Go or Scotland Yard) only with the rules, without being given more information,” explains Finbarr Timbers, Midjourney researcher and author of the study. “With them he determines what actions he can perform and whether he has won or lost,” he continues.

To know the moves to make at each moment, the algorithm is based on what is called “counterfactual regret minimization.” This focuses on the analysis of all possible plays. “Regret,” according to Timbers, means “how well you could have done if you had played optimally, minus how well you actually played.” An example: if in poker you have won 200 chips following some plays, but you could have won 1,000 with others, the regret is 800 chips. Therefore, the objective of game student is to reduce the 800 chips as much as possible. It takes into account all possible scenarios with the cards that are face up, that is, public information, and averages them all.

All possible scenarios converge to the Nash equilibrium, theorem of the American mathematician John Nash. The players in a game play their strategies to maximize profits and adapt them throughout the game according to the plays of others. Timbers and his colleagues have relied on it so that the algorithm searches for an optimal strategy in most situations.

Each game transports the participant to different scenarios. In chess, when you are in a certain position on the board, you can search through the possible moves to find the best one. However, in poker it doesn’t work like that. Timbers explains that you have to consider the impact of plays in other situations: “If you start betting high every time you have a strong hand, by betting aggressively you will reveal to your opponent that you have a good hand. Likewise, if you stop betting when you have a weak hand, you will reveal to your opponent what your hand is.”

The British company DeepMind, owned by Google since 2014, developed an algorithm called R-NaD capable of playing Stratego like an expert human, a popular 40-chip game where players must capture the opponent’s flag or leave them without chips. R-NaD uses algorithmic tricks to obtain good performance but without using the search method. For this reason it is not as strong as an algorithm. Student: “The literature has historically shown that algorithms that search through possible actions are often better at games than algorithms that do not use search, but they are slower and more expensive to train,” reveals Timbers.

Competitive artificial intelligence is used to measure the effectiveness of computer programs and to obtain a better gaming experience, but it can also have negative implications: “It is very possible that cheating occurs on poker betting websites and in similar games. Many competitive video games will try to be inflexible with the software allowed on each player’s computers to ensure that an artificial intelligence does not play, something that Riot Games already does with Valorant (2020)”, indicates Diego Rodríguez-Ponga Albalá, founder and director of Póntica. To this end, he points out that it is foreseeable “that very sophisticated artificial intelligence will be developed to automatically detect whether the player is human or not.”

Gema Ruiz, head of innovation at Softtek EMEA, also points out other limitations of the algorithm, such as the use of betting abstractions in poker and “computational expenses.” The use of abstractions consists of grouping similar plays that are treated in the same way to reduce the complexity of the game. When the student trains in poker, he uses random betting abstractions to reduce the number of actions from 20,000 to 4 or 5. In the future, the study suggests that his use could be replaced by “a broader policy that can handle a variety of actions in game situations with a large number of possible decisions,” says Ruiz. Furthermore, the enumeration of all possible moves of the algorithm involves a high cost and for this they propose a “generative model”, according to the study. This generates state samples [estrategias] of the world and operates on the subset of the selected samples, rather than listing all possible hand combinations.

Despite this, the tool, for Ruiz, is “a promising contender in the field of gaming algorithms driven by artificial intelligence.” It highlights “its ability to improve performance with increased computational resources, together with solid theoretical foundations.”

