What the 15% minimum tax means for the tech industry


“It’s your boss’s voice. It sounds like the person you talk to every day,” said Giacopuzzi, StoneTurn’s senior consultant for cyber investigations, intelligence and response. “Sometimes it can be very successful.”

It’s a new twist on the impersonation tactics long used in social engineering and phishing attacks, but most people aren’t trained to believe their ears.

So while many companies think that cyberattacks using deepfakes of various kinds are still a future threat, more companies are learning the hard way that they are already here, experts told Protocol.

“It’s your boss’s voice. It sounds like the person you talk to every day.”

Among cybersecurity professionals who focus on responding to cyberattacks, two-thirds of those recently surveyed by VMware said deepfakes — including audio and video fabrication — were a component of attacks they had investigated over the past year. That was an increase of 13% over the previous year’s study.

The survey of 125 cybersecurity professionals did not reveal what proportion of deepfake attacks were successful, and VMware did not provide details on specific incidents. But Rick McElroy, senior cybersecurity strategist at VMware, said he spoke to two security chiefs at companies whose businesses have fallen victim to deepfake audio attacks in recent months.

In both cases, the attacks resulted in six-figure sums being transferred, McElroy said. Other publicly reported cases include an incident where a Hong Kong bank executive was reportedly tricked into transferring $35 million to attackers in 2020 through deepfake audio.

Right now, responding to deepfakes isn’t a part of most security awareness training, McElroy noted.

“Generally speaking, [deepfakes] are likely to be treated as something ‘funny’ – unless you were actually attacked,” he said.

Cloning Voices

With a short audio sample of a person speaking and a publicly available tool on GitHub, a human voice can be cloned today without AI expertise. And it might not be long before faking someone else’s voice in real-time becomes possible.

“Real-time deepfakes are the biggest threat on the horizon” in this area, said Yisroel Mirsky, head of the Offensive AI Research Lab at Ben Gurion University.

Mirsky — who previously led a study on the potential of deepfakes in medical imaging to lead to misdiagnosis — told Protocol that his attention has recently shifted to the threat of voice deepfakes.

The aforementioned GitHub tool, available since 2019, uses deep learning to clone a voice from just a few seconds of audio. The tool then enables the cloned voice to “speak” entered phrases using text-to-speech technology.

“Real-time deepfakes are the biggest threat on the horizon.”

Mirsky provided Protocol with an audio deepfake he created with the tool, using three seconds of someone’s voice. The tool is too slow to use in a real-time attack, but an attacker could create likely phrases in advance and then play them back as needed, he said.

Thanks to advances in voice deepfake generation, the problem for attackers is now less whether they can clone a voice and more how to use the cloned voice in real time. The ideal scenario for an attacker would be to speak instead of type and have their speech converted to the cloned voice.

But it seems that advances are being made in this technology as well. Mirsky pointed to a vocoder device that is said to be able to perform audio signal conversion, a key part of the process and the biggest bottleneck, with a delay of just 10 milliseconds.

In other words, a real-time voice deepfake might be achievable in the near future, if it isn’t already.

Hi-Fi social engineering

No doubt attacks like deepfakes that target the “human attack surface” will take time for people to get used to, said Lisa O’Connor, chief executive of Accenture Security.

For example, if you hear a familiar voice on the phone, most people “haven’t built the muscle memory to really think to challenge that,” O’Connor said.

But judging by the advancement of voice cloning technology, it seems like we should start.

All in all, Mirsky sees audio deepfakes as the “much bigger threat” compared to video deepfakes. Video, he noted, only works in limited contexts, but fabricated audio can be used to call anyone, he noted.

And while “it might not sound quite like the individual, the urgent pretense will be enough to get it.” [the target] fall for it in many cases,” Mirsky said. “It’s a very powerful social engineering tool. And that’s a very big concern.”

In response, Mirsky said his lab at Ben Gurion University is currently focusing on the deepfake audio threat, with the goal of developing a way to detect cloned voice attacks in real time.

According to McElroy, training and changes to business processes will also be critical in defending against this type of threat. In wire transfer cases, for example, companies may want to add another step to the process, such as a security warning, he said.

But that becomes more difficult when a deepfake is left as a voicemail, especially in what appears to be a high-pressure situation, he conceded.

Giacopuzzi, StoneTurn’s cyberattack investigator, said the “sense of urgency” that is a cornerstone of social engineering attacks carried over to deepfake audio attacks. “The same buttons are still being pushed,” he said.

And that’s the most worrying thing of all, said Giacopuzzi: “It plays with our psychology.” And as a result, “there are successes. So I think it’s only going to get worse.”


Comments are closed.