The speech is very personal and says a lot about us. Therefore, the use of voice data collected by companies must be protected by law.
In January 2021 Swedish music streaming service Spotify received a patent for its speech analysis program. It identifies the user’s emotional state, gender, age and accent from the human voice recorded through the phone. Utilizing the data, Spotify’s algorithms could suggest something new to the customer to listen to.
The service sounds harmless, but in terms of data and privacy protection, the consumer would be in a new area. When the results of speech analysis are combined with a user’s listening history and the taste of his or her friends, the company not only gets a huge amount of data about its users, but also the opportunity to manipulate consumer sentiment by offering a certain type of music.
The online store Amazon has also developed an algorithm that, according to the patent, is used to gather information about the speakers ’perceived gender, age, ethnic origin, health, and emotions.
Information about such projects has caused concern and debate. Especially when technology companies are reluctant to tell us what they are using the information they collect.
Speech will be an increasingly important part of digital services and goods in the future. Synthetic speech is little different from genuine speech, and speech analysis is also evolving at a rapid pace. The faster voice-based applications and user interfaces become more widespread, the more information we provide about ourselves to service providers and device manufacturers.
Speech tells us a lot: it communicates thoughts, feelings, and states of mind. It contains information about body functions and diseases. Ethnic background as well as class and cultural differences are revealed in rhythm, intensity, emphasis, and pitch variations in speech. Sounds, Accents and Intonations are part of our identity.
On the other hand, it is necessary to collect audio data in a versatile way in order for the technology to work as reliably as possible. For example, the Donate Speech campaign is intended to develop speech recognition in Finnish.
One the problem is the richness, nuance, and ambiguity of speech. Therefore, speech recognition is inevitably inaccurate. Rough speech analysis combined with consumer typing algorithms easily produces misleading and stereotypical information about users at the individual level.
Another concern is that learning deepfake algorithms already know how to modify and model speech very credibly. In the future, identity theft is likely to be increasingly based on speech given by people. It is also possible to misuse speech recognition services with captured or synthesized sounds.
As voice services become more commonplace, we may become too careless about speech. In addition to text messages and self-portraits, we need to think about what kind of voice we use to create impressions of ourselves. We may even adopt different ways of speaking for different services. Devices that may listen to us in public places, at home, and at work are likely to increase discomfort, suspicion, and mistrust.
In addition, it has been found that the more humane AIs seem to be, the more openly we talk to them about our feelings, bodies, thoughts, and states of mind. Man has a need and a desire to humanize other beings, both living and non-living. To this end, algorithms are already being developed that can filter out sensitive information from recorded speech.
Everyone we cannot yet imagine the risks. It is therefore important that ethical and legal rules of the game are created for the use of speech, taking into account how multidimensional information speech can contain.
Speech, like other biometric data, is very personal and its collection, as well as commercial and official use, must be protected by law. Ultimately, it is a question of whether we own our own voice or whether it is owned by someone else.
Pertti Grönholm and Kimi Kärki
The authors are docents at the Department of History, Culture and Art Studies of the University of Turku.
The guest pens are the speeches of experts selected by the HS editorial board for publication. The opinions expressed in guest pens are the authors’ own views, not HS’s statements. Writing instructions: www.hs.fi/vieraskyna/.