Once the sections were identified, another algorithm was in charge of classifying them, differentiating, for example, between the paternal and maternal surnames of the victims. Additionally, through this algorithm, the nationality of the people was identified, as well as marital status, address, occupation and other characteristics.
Finally, they used different optical character recognition (OCR) tools, which help transcribe printed documents into a format more similar to that of Word text documents. The program that had the highest level of success in correctly transcribing the cards was the software open source Calamari. However, the best results came after Mateo and his colleagues made adjustments to make the tool better suited to the type of characters he was reading.
During each stage of the methodology, Mateo carried out tests in order to measure the effectiveness of the different tools. To do this, he and his team took samples of the microfilm rolls and compared how much information was correctly extracted through the chain of algorithms.
When evaluating the identification of token sections, the success level was 95%. Section classification had the same overall accuracy percentage. Character recognition had more errors before receiving training from the research team. However, after the fixes, the level of success increased considerably. From a sample of 99 documents, Calamari made 89 correct transcriptions, however, after correction and adjustments, the number increased to 95.
After all that work, Mateo earned his master’s degree. For their part, people interested in reviewing OCOA documents, like Ignacio Errandonea, received a search engine that will save them hours of manual document review.
Technology at the service of the archive in Latin America
Mateo’s work is not the only one of its kind. At the University of the Republic, his colleagues have been working on other aspects of the dictatorship archives as part of the Cross projectan initiative launched in 2018, in which teachers and students from the faculties of Information and Communication, on the one hand, and the Faculty of Engineering, on the other, joined together in the task of systematizing the information from the archives of the dictatorship.
An example of these efforts is offered by Gregory Randall, a professor in the College of Engineering and Mateo’s advisor. His students developed a tool to recognize the different stamps with which the repressive forces stamped their documents, in such a way that, by identifying the stamp, they can know which organization a sheet came from.
The case of Argentina
Outside of Uruguay, there are also similar projects that seek to take advantage of technology to analyze archives from dictatorships. One of the alliances with the longest history is that of the Provincial Memory Archive with the National University of Córdoba (UNC), in Argentina.
#Repair #recover #reveal #archives #Latin #American #dictatorships #reborn #artificial #intelligence