What do the polls say about the elections in Mexico? Sheinbaum has an 86% chance of winning

The average of polls prepared by EL PAÍS maintains Claudia Sheinbaum as the main candidate to win the presidency of Mexico, with 56% of the estimated vote, ahead of Xóchitl Gálvez (36%) and Jorge Álvarez Máynez (7%).

One month before the vote, Morena’s candidate is a firm favorite. But what chances does she have exactly?

To answer that, we use a prediction model, like the ones EL PAÍS has used in dozens of elections, including those in Mexico in 2018. The model works in three steps that are detailed in the methodology: (1) we start from the average of surveys; (2) we add uncertainty around it based on the historical error of the polls and the time until the vote; and (3) we simulate the election 20,000 times to assign probabilities of victory.

After this process, our prediction is that Claudia Sheinbaum has an 86% chance of victory, compared to 14% for her rival Xóchitl Gálvez:

It is important to interpret these probabilities well. Sheinbaum is a clear favorite. But Gálvez is not ruled out, because 14% probability events happen sometimes. The surprise is as easy (or difficult) as seeing an elite shooter like Cristiano Ronaldo miss a penalty. Essentially, what we are modeling is the probability that the polls will be wrong, or off, enough to cause a surprise.

Gálvez closes distance, but slowly

The poll average has been moving for months to reduce Sheinbaum’s advantage over Gálvez, from the 32-point difference in December to the current 20. And there are pollsters like Altica that already reduce that distance to 10 points.

Since February, a slow decline of the official candidate has been evident. However, Gálvez only partially takes advantage of it, because in these months the third candidate, Álvarez Máynez, also rises, going from 5% to 7% of votes on average.

Other forecasts agree in seeing Sheinbaum as a firm favorite

According to the prediction market Polymarket, Sheinbaum wins with 95% probability. And according the Metaculus prediction communitywhich I hold in high regard for its accuracy, would have 89% options, closer to our own model.

However, it is interesting to look at how the forecasters have been moving in these months. Because, even though Sheinbaum has fallen in the polls, his chances of winning according to Metaculus have been increasing. As the weeks passed without data or news unfavorable to the candidate, confidence in her victory increased.

Methodology

Predictions are produced by a statistical model based on surveys and their historical accuracy. One similar to the ones we use in Spain in 2023 and twice in 2019in Andalusia, Catalonia either Madrid. . Also in Mexico six years ago, in France either the United Kingdom. The model works in three steps: 1) aggregate and average the polls, 2) incorporate expected uncertainty, and 3) simulate 20,000 elections to calculate probabilities.

Step 1. Average the surveys. Our average takes into account dozens of polls to improve its accuracy. The data has been collected mostly from the web Oraculus.mx. The average is weighted to give different weight to each survey according to two factors: the survey house (companies without experience have less weight; those that do not publish their data at the INE are excluded) and the date. We want to give more weight to recent polls when calculating the average, and that on the last day only the latest ones published by each pollster matter. For this, we assign weights to the polls according to an exponential decay law. And we define an exclusion band that ignores surveys that are more than 30 days old. In addition, we penalize repeated surveys by the same interviewer. When calculating the average on a date, the closest survey of each house has a weight of one, but the rest of its surveys are almost ignored.

Averages like ours can be viewed as a consensus estimate. Instead of relying on a single pollster, they add the criteria and hypotheses of many. Averages reduce noise, preventing trends from jumping up and down by chance. And above all: have been shown to improve accuracy.

Step 2. Incorporate survey uncertainty. This is the most complicated and most important step. We need to estimate the expected accuracy of polls in Mexico. How large are common errors? How likely are errors of 3, 5, or 15 points to occur? To answer these questions, dozens of surveys in Mexico and thousands internationally are studied.

Calibrate expected errors. First I have estimated the error of the surveys in Mexico. I have built a database with surveys from seven elections since 2000. The mean absolute error (MAE) of the survey averages in Mexico, by candidate or party, considering those with more than 10% of the vote, has been around 3.8 points in the presidential elections and 2.2 points in the legislative elections. That is, deviations of four or five points were common and the margin of error (95%) was around nine points. As seven elections are not enough to draw strong conclusions, we also reviewed around twenty votes in other Latin American countries, where the MAE error rose to 4.1 points. In the end, following a principle of caution, I have decided that our model assumes an MAE of 3.8 points in Mexico.

Furthermore, this uncertainty is modulated taking into account two additional factors: the size of the candidate/party (because it is easier to estimate a party’s vote if it is around 5% than if it is close to 50%) and the proximity of the elections ( because the polls at the end are almost always more accurate). To adjust this part of the model I have resorted to the Jennings and Wlezien database, published in Natureand analyzed the errors of 4,100 surveys in 241 elections in 19 Western countries.

Choice of distribution type. To incorporate uncertainty into the vote for each candidate/party in each simulation I use a multivariate distribution. I use t-student distributions instead of normal ones so that they have longer tails (kurtosis): that makes it more likely that very extreme events will happen. The advantages of that hypothesis Nate Silver explained it. I have estimated the level of kurtosis with the previous database. Then I define the covariance matrix of these distributions so that the sum of the votes does not exceed 100% (an idea by Chris Hanretty). Finally, the amplitude of the covariance matrices must be scaled so that the resulting vote distributions have the expected MAE and standard deviation according to the calibration.

Step 3. Simulate. The last step is to run the model 20,000 times. Each iteration is a simulation of the elections with vote percentages that vary according to the distribution defined in the previous step. The results in these simulations allow us to calculate the probabilities that each candidate has of being the most voted and achieving the presidency.

Why surveys? This model is based entirely on surveys. There is a perception that the polls are not reliable, but the truth is that surveys work. Polls are rarely perfect, but There is no alternative that has been proven better..

Subscribe to the EL PAÍS Mexico newsletter and to the channel electoral WhatsApp and receive all the key information on current events in this country.

#polls #elections #Mexico #Sheinbaum #chance #winning