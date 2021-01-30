When we talk about statistics, averages, polls, percentages and some missing class from high school often come to mind. It is very likely that the last thing to be represented is something literary. But the universality of a writer like Jorge Luis Borges means that, in addition to so many other sciences, the scientific discipline founded in the 20th century has a lot to do with The Aleph, Pierre Menard, the Golems and mazes.

That was what he understood Walter Sosa Escudero, professional statistician and attentive reader of the work of the most recognized Argentine writer in the world, and what he did in Borges Big Data and I: an attempt to disseminate the knowledge of statistics in relation to the new phenomenon of Big Data. All in a Borgean key.

“Borges ‘flirts’ with mathematics. A very nice book by the writer Guillermo Martinez (Borges and mathematics) says that Borges knew relatively little about mathematics. Certainly more than most people, but much less than a mathematician or physicist. The surprising, perhaps magical, in Borges is his ability to create a universe out of those ‘few’ knowledge ”, Escudero explains to Clarion.

From these flirtations, Sosa Escudero reconstructs problems of statistics, “a very important discipline in the history of science, and which is capable of guiding the process of interaction with complex data such as big data”.

Jorge Luis Borges was born in 1899.

Now, as you read in the book, there is a problem: statistics is “the science of the part for the whole, its objective is not to ‘analyze data’, but to get to see what is behind it: it tries to give useful answers at low cost, in situations with limited information ”, he explains in the book.

Big data, on the other hand, responds to a “Data science”, which “is a name that, when used well, refers to this millennial version of statistics in times of big data and powerful algorithms.”

¿So what do we need statistics for, which takes the whole through the part, in a world where we have “all data”?

“Hidden in a Borges story is the argument why big data is not and never will be all the data, no matter how many they are,” Escudero clarifies.

Here, some reflections between the eternal writer and scientific statistics, a discipline that according to Escudero, “It is more alive than ever.”

“I always liked applied mathematics, and social issues and computing caught my attention. Statistics offered me the possibility of handling myself in that intermediate territory, where exact science coexists with technology and social problems “ Walter Sosa Escudero Statistician and writer

─You claim that “big data” is actually “new data”. Why?

─It’s a little play on words. The idea of ​​“big” suggests that the data revolution comes from the quantity. It seems to me that what is truly revolutionary, more than the quantity, is that now we have information about what people do (voters, consumers, etc.) than not many years ago it was unthinkable that it was available. In addition, it seems to me that this type of data is no longer the same as we had before but of a new nature (anarchic, spontaneous), which requires new ways of processing and studying them. Big data is not more of the same but a different phenomenon.

─Why do you say in the book that “the problem of statistics is the problem of identification”?

─It is a complex problem. Statistics assume that different things behave differently. When this is the case, it is said that the problem is “identified”, that is, a person who has coronavirus behaves very differently from one who does not, a person who is in favor of a political candidate to another who does not. Only in this circumstance is it possible to learn something, seeing what it does.

─And how can Borges help to illustrate this?

─In Pierre Menard, Author of Don Quixote, Borges raises the bizarre situation of an author (Pierre Menard) who manages to “write” a part of Don Quixote. In other words, he does not plagiarize it, but rather reproduces a series of circumstances that lead him to write, exactly, a part of Don Quixote. From this perspective, equipped with the “data” (the texts), produced by Cervantes and Menard, we cannot learn who its author was. Statistically, the problem of authorship “is not identified.” Pierre Menard is the worst nightmare for statistics.

─Funes is “big data without statistics”, you point out. Why? What does this mean?

─Funes is a character from a Borges story (Funes The Memories) who can and wants to remember everything. Funes does not process data, it passively replicates it. As Borges says, for Funes “To think is to forget differences”, which is exactly what science does: see what is behind the data, its regularities, separate the signal from the noise. The phrase was said Stephen stigler, perhaps the most important historian of statistics, suggesting that the big data revolution is not about data but about data and its systematic analysis.

“Borges’s work involves all relevant aspects of life. And one of them is science: it is perhaps the favorite author of all scientists “ Walter Sosa Escudero Statistician and writer

─What problems in the search for information can be illustrated with The library of Babel?

─The Library of Babel is a universal library, which contains everything that can be written. That is, all the books that have been written and all those that could be written. Trivially, whoever searches the Library of Babel, finds. Borges clearly cautions about this question. Something similar for Google: he who seeks finds. From which it follows that the search without specific questions can lead to dangerous conclusions, especially when, despite the known confirmation biases, many people use the results to self confirm the biases you already had.

─How do you explain the Gaussian bell with Evaristo Carriego placeholder image?

Borges, in the house of the poet Carriego, to whom he dedicated a book.

─In a rare footnote to Evaristo Carriego placeholder image (a long biographical essay on his life and times), Borges raises the idea of ​​measuring the passage of time by the rate of accumulation of events: “If time is succession, we must recognize that where there is a greater density of facts, the more time runs and that the largest is that of this inconsequential side of the world ”, which suggests that in this part of the world, time has passed very quickly, in light of the events that occurred in Borges’ time.

─And this is even related to the famous phrase of “flattening the curve” of the pandemic.

─Sure because the idea of ​​”hoarding” is exactly what a density function measures, in particular the “Gaussian bell”. The phenomenon of “flattening” the curve that was discussed at the beginning of the pandemic has to do precisely with this idea: how cases accumulate over time.

“The need to deal with data is as old as information and societies and, consequently, so is statistics. But in the twentieth century is when it reaches its majority, hand in hand with Ronald Fischer, one of its founding fathers “ Walter Sosa Escudero Statistician and writer

─How do you explain censuses and samples based on The Garden of Forking Paths?

─Almost everyone tends to believe that a census is more complete than a sample. In The garden, Borges proposes a temporary labyrinth where all of us who are and those who could have been coexist, those who are reading this note and the version of those very people but who decided not to do it. From this perspective, a census is also a sample, because it is “only” a manifestation of the circumstances in which the census taker sees us. The census is something like a sample of the Garden of Forking Paths, necessarily incomplete, which speaks of the apparent perfection of a census in relation to a sample is relative.

Borges at the Westin hotel in Rome in 1981. Photo Marcello Mencarini – Leemage

─What is the relationship between golems and algorithms?

─The Golem is a character from Jewish culture, to which Borges dedicates a beautiful poem (The Golem). It is a character created by men, who is limited to doing some basic tasks. The remarkable statistician Richard McElreath uses the figure of the Golem to illustrate the fact that algorithms, understood as iterative methods, are a consequence of whoever designed and programmed them in the long run.

─You speak in the introduction of a “Borgean Gestapo”. What advice would you give someone who wants to start reading it?

─It’s a joke, I hope they don’t take it too much out of context. I mean the attitude of many who, possibly involuntarily, they get angry with the initiates to Borges’s literature. Borges’s literature is very circular, “self-referential”. My suggestion is that with Borges’s work you have to do the same as when you consider where to get on a carousel: anywhere, it is once up that one has to decide what to do.

─What are you looking for with this book?

─I wrote this book so that the reader is encouraged to start reading Borges, wherever they want. There are entries simpler than others (Funes is much more “digestible” than Tlon, Uqbar, Orbis Tertius), but, as with everything complex, everything is a matter of starting, I don’t know if it matters so much where. Borges’s trick is not so much reading but rereading.

Borges, big data and I / Siglo XXI Editores / 176 pages / 750 pesos

Borges, big data y yo, edited by Siglo XXI Editores, is available in physical format and in ebook.