Voice Recognition, Data Collection, and Voice Deepfakes Have Sparked a Push for Speech Protections

Your tone of speech reveals a lot more about you than you would realize. Your voice, for example, can instantly reveal your attitude to the human ear; easy it’s to discern if you’re enthusiastic or upset.

On the other hand, machines can infer your age, gender, ethnicity, socioeconomic status, health state, and more. Scientists have even created images of people’s faces using data from their voice data.

Companies benefit as machines improve their ability to understand you through your speech.

Voice recognition systems have amplified in recent years, from Siri and Alexa to those that use your voice as a password, as AI and ML have tapped the ability to comprehend what you’re saying and who you are.

Big Voice might be a 20 billion USD industry within a few years. As the market expands, privacy-focused experts are increasingly looking for efforts to stop people’s voice data from being exploited against them.

A Threat to Privacy

According to Emmanuel Vincent, a senior research scientist focusing on voice technologies at France’s National Institute for Research in Digital Science and Technology (Inria), both the words you say and how you say them, machines can use to identify you.

“It is just the beginning. Your voice can also uncover information on your emotions and medical state,” Vincent says. “These additional pieces of information assist develop a complete picture,” he added, “and then this would be used for all kinds of targeted ads.”

Moreover, there is a possibility that hackers could gain access to the place where your speech data is stored and use it to mimic you, in addition to your voice data potentially feeding into the enormous universe of data used to show you online adverts.

A few of these cloning cases have already occurred, demonstrating the value of your voice. People have also been recorded saying “yes” to use the confirmation in payment scams using simple robocalls.

Last year, TikTok updated its privacy regulations and decided to collect voiceprints – a broad term for the data included within your voice – along with other biometric data, such as your faceprint, from users in the United States.

Call centers are also employing AI to assess people’s “behavior and emotion” during phone calls and evaluate the “tone, speed, and pitch of every word” to build profiles of people and boost sales.

“We are at the point where the systems to recognize who you are and link everything together exist, but the protection isn’t there, and it’s still a long way off from being useable,” says Henry Turner, a University of Oxford researcher. He studied the security of speech systems.

Speech Anonymization Efforts

Anonymization aims to maintain your voice sounding human while removing as much information as possible that machines can use to recognize you. Anonymizing the contents of what someone is saying by deleting or altering any critical phrases in files before they are saved and anonymizing the voice itself are two independent strands.

The majority of current speech anonymization initiatives entail running someone’s voice through experimental software that modifies some of the characteristics of the voice signal to make it seem different. It includes changing some of the pitch, replacing portions of speed with data from other voices, and synthesizing the final product.

But, Is anonymization technology effective? “Without entirely changing the voice, effective anonymization is not achievable,” says Rite Singh, an associate professor at the Language Technologies Institute of Carnegie Mellon University. “It’s not the same voice if you change it entirely.”

Despite this, Singh believes it is still worthwhile to build voice-privacy technology because no privacy or security system is completely secure.

Although fingerprint and face recognition technologies on iPhones have been hacked in the past, they remain a reliable means of preserving people’s privacy.