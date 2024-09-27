Last week’s baffling random numbers gave rise to interesting comments from my kind readers. To the question: “Why is it more likely that the number of inhabitants of a city begins with 1 than with 9?”, Jonathan Arnold answers (presumably from an English-speaking country, judging by its name and the absence of accents in your text):

“If we think about the population of an urban center of, for example, 800 or 900 inhabitants, it will be necessary to increase the population by 100 to 200 inhabitants to reach the figure of 1,000 inhabitants, a number that begins with a 1. From there , it will be necessary to increase the population by 1,000 inhabitants so that the first number goes from na n+1″.

And Xoaquín Fernández adds:

“If we apply Zipf’s law by population segments (from 10,000 to 99,999; from 100,000 to 999,999…) we could explain the result well. For each section the number of populations in the initial segment is greater than in the next. My explanation would emphasize agglomeration economies based on a more balanced initial distribution of the population; It would be a cumulative process, which would select, sometimes by chance, some point, and reinforce it.”

Zipf’s law was formulated in the middle of the last century by the American linguist George K. Zipf, who applied statistical analysis to the study of different languages ​​and found that the frequency of appearance of words follows a pattern similar to that established by the law. from Benford-Newcomb (but that’s another article).

For her part, Adelaida López brings up an interesting anecdote in relation to the topic at hand:

“There are simple statistical tricks to detect certain types of fraud. For example, Hill (the first to mathematically formalize Benford’s law) proposed to his students the homework exercise of tossing a coin 200 times in the air and recording when it came up heads and when it came up tails. The laziest, and most cheaters, did not bother to actually toss the coin 200 times and randomly wrote down heads and tails in a fairly uniform manner, but it never occurred to any of them to write down heads or tails 6 times in a row, because intuitively they did not consider it likely that such long consecutive series were given, which is false when 200 real launches are made. “Because of that ruling, Hill detected the cheaters.”

In fact, the probability that flipping a coin 200 times will eventually result in 6 heads or 6 tails in a row is about 96% (can you calculate it?), so a too even distribution of heads and tails was a sign ( almost) sure I cheated.

Counting words, surnames, people…

In future installments we will deal (it is not majestic plural: I tell, as usual, with the collaboration of my sagacious readers) of the aforementioned Zipf law (and the Pareto principle, with which it is closely related) and, As a warm-up, I propose the following exercise: choose any text of a certain length (a chapter of a book, a story, a long article…), write down the number of times the five most frequent words appear and try to establish a relationship between these frequencies. If onomastics attract you more than linguistics, you can do the same with the five most frequent surnames. Or with any other set that lends itself to ordering its subsets by the number of elements, such as the populations of the most populous cities in a country.