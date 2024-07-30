The technology already exists: if necessary, this report could have been written without typing, simply by dictating the text to the processor. However, it is still far from being convenient: one would have to go back over the text to correct (possibly add) punctuation and to change words that were misunderstood. And after rereading, it is also likely that the result would have to be revised in general, since we do not speak the same way as we write. Even if, when dictating, we are thinking that the result will be a written text. These are some of the problems that graphic designer Miriam Inza encountered when creating for the magazine. Intangible Design Article Writing with your mouth: voice dictation as a writing practice. In the text, some of the consequences of writing by dictation can be detected: the machine sometimes misunderstands or does not detect some words: “For this article to make sense, for it to really involve the implementation of a type of writing axe with the mouth, I self-imposed the rule of not correcting what is being written.”

“Perhaps one of the aspects in which [las tecnologías de voz a texto] “The only way to make a huge qualitative leap is in automatic punctuation,” Inza confirmed in an email he typed. “At the moment, to write by voice you have to dictate the punctuation marks or, in the case of transcribing an interview, for example, enter them manually. Some tools have automatic punctuation; only in some languages, but work is being done on it,” he says. Even so, what is missing are only “details: being able to write at the speed at which one speaks without using one’s hands is already the future of the present,” he says.

One of the keys to the great advance that voice-to-text technologies have had in recent years has been the arrival of Whisper, the Automatic Speech Recognition (ASR) model that OpenAI released at the end of 2022. The tool is controversial: according to an investigation of the New York TimesOpenAI created Whisper when it ran out of text on the internet to feed its AI. With Whisper, the door to all of YouTube opened for them, giving them more natural, conversational material with which to train GPT-4, their most advanced language model. This use, however, could have violated YouTube’s rules, not to mention the privacy of the users who appear in those videos (Google, which owns the online video service, also uses that material to train its own AI).

Technological wars aside, “Whisper has changed everything,” says José María Fernández Gil, head of the Digital Accessibility Unit at the University of Alicante. “AI tries to transcribe entire sentences, with their full stops, commas, exclamations, questions… And it is not going to make, or would residually make, contextual errors such as ‘the gray hair is very comfortable’, because it has not distinguished between the N and the M,” he exemplifies. At the University of Alicante itself, they have used the model to subtitle nearly 1,800 hours of video with “impressive” precision.

As for what still needs to be improved, Fernández Gil points out that there is still a lack of vocabulary and that it makes mistakes with some acronyms, although “much less than traditional systems.” However, Whisper’s computational cost is very high, something that is “out of the reach of most people.”

Another issue that has not yet been resolved is the processing of different accents and dialects, “especially if they are used locally or regionally,” adds Dayana Ribas, scientific director of Business Telecommunication Services (BTS), a telecommunications company that is also using these technologies in various projects. Ribas mentions that transcription also fails when words in different languages ​​are used, a situation “frequent in the daily life of practically bilingual countries, such as Puerto Rico.” The fact that these types of details are still missing is a clear example of the problem of bias, she points out.

There are also pending issues such as the transcription of audio in realistic, everyday scenarios “that present a mix of distortions of various kinds, for example, telephone calls with their ambient noises,” the automatic correction of errors and the “constant and growing” need to address the issue of security and privacy, adds the expert.

Are we going to move on to writing by dictation?

With technology already at its peak, the next question arises: will there come a time when the first option when we want to write a text is to dictate it to a machine? All the experts interviewed agree that we speak and write differently, so this is something that must always be taken into account. Dayana Ribas believes that dictation can be practical for more creative tasks or writing drafts, since “it facilitates speed and naturalness in the production and saving of ideas” and we can do it while doing “other semi-automatic things for humans, such as walking or cooking, and it requires less effort.” However, “to generate more precise ideas that require concentration, such as writing a technical report or a novel, it is likely that sitting down to type offers adequate time to think and produce ideas with more control,” she adds.

In this regard, Miriam Inza recalls Roland Barthes, who said “that the distance between the head and the hand is greater than that between the head and the mouth, and that time can be used to think.” One of the things she noticed in her research on “writing with the mouth” is that it also changes the way we speak. “To write a text using voice dictation, a specific way of dictating must be adopted,” she explains.

It is also quite possible that a generational gap will be seen in all this. Compared to people who are used to typing quickly on a computer keyboard, “the new generations have seen the microphone icon for dictation since they were little and they use it a lot,” says José María Fernández Gil. He gives the example of his niece, who is a teenager and, when she uses her mobile phone, “she usually prefers to dictate in the applications rather than write.” From what she tells her uncle, this is something that is widespread in her generation.

On the other hand, a change in the writing instrument will give texts with different characteristics. Virginia Woolf, for example, complained when she wrote a letter on a typewriter (she tried not to) about how the instrument cut and broke the sentences that were very clear and beautiful in her head. Related to all this, using AI tools to write also has its impact: an investigation A recent study by Harvard University concluded that texts written using predictive language are “more succinct, more predictable and less colorful” than those that do not use it. There are still no studies on what texts written “by word of mouth” will be like.

A revolution for accessibility

Developing voice-to-text technology does not only mean progress in terms of convenience or speed when carrying out certain tasks, but it will also be an option that helps many people. The head of the Digital Accessibility Unit at the University of Alicante gives some examples: it will help people with hearing impairments who, thanks to the generalisation of automatic subtitles, will be able to “hear (read)” what they do not hear; it will improve the integration of people from other countries and cultures by combining the recognition of spoken language with translation; it will allow “people who do not know how to write well (educational, cultural, socioeconomic level…)” to write well, as well as making life much easier for people who, due to motor problems, cannot or have difficulties writing using their hands.

Dayana Ribas also highlights the possibilities that are opened from the point of view of learning, since “it strengthens the educational system with tools that make it easier to take notes and study.” It can also change many things in the field of customer service. In a health center, for example, doctors could better care for patients while the computer transcribes what they are saying.

When it comes to simply producing a text like this, dictation will be another option. “Having options is always an advantage. The choice of one way or another to produce text will be very personal and will in any case be filtered by each person’s auditory, visual or reproductive characteristics to inspire or better fix ideas,” says the scientific director of BTS.

Perhaps the images of writers, which went from depicting them with a pen in hand to showing them behind a screen, will become photographs of people walking and talking at the same time in a few years. Or perhaps not. “Voice dictation technology is having and will have a strong positive impact on the various writing tasks. But just as some of us prefer to write certain things by hand rather than typing them on a mobile phone or computer, there will also be those who find typing more pleasant than dictation. Even if it is only for the pleasure of being able to write in silence,” concludes Inza.

