Article Details

Scrape Timestamp (UTC): 2025-06-19 19:32:00.739

Source: https://www.theregister.com/2025/06/19/voice_altering_vishing_jammer/

Original Article Text

Click to Toggle View

Boffins devise voice-altering tech to jam 'vishing' schemes. To stop AI scam callers, break automatic speech recognition systems. Researchers based in Israel and India have developed a defense against automated call scams. ASRJam is a speech recognition jamming system that uses a sound modification algorithm called EchoGuard to apply natural audio perturbations to the voice of a person speaking on the phone. It's capable of subtly distorting human speech in a way that baffles most speech recognition systems but not human listeners. The tech is needed because recent advances in machine learning, text-to-speech (TTS), and automatic speech recognition (ASR) have made it quite easy to automatically make phone calls with the intent to scam or defraud. These "vishing" attacks – like email-based phishing but using voice instead of text – see criminals and scammers use TTS to create a realistic-sounding voice that speaks words they hope will lure victims. If the recipient of a call responds, the crook's ASR system attempts to convert their vocal response to text, so the back-end model can decipher what was said, devise a response, and conduct a conversation long enough to elicit sensitive information or prompt the victim to take an action. Vishing increased 442 percent between the first and second half of 2024, according to CrowdStrike's 2025 Global Threat Report. During the first half of that year, the US Federal Trade Commission said that using AI-generated voices for phone calls is illegal. As Crystal Morin, former intelligence analyst for the US Air Force and cybersecurity strategist at infosec vendor Sysdig, told The Register in December 2024, voice-based phishing is becoming harder to detect as AI models get better. Freddie Grabovski (Ben-Gurion University of the Negev), Gilad Gressel (Amrita Vishwa Vidyapeetham), and Yisroel Mirsky (Ben-Gurion University of the Negev) have come up with a defense against vishing, described in a pre-print paper titled "ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams." They argue that the ASR component of the scammers' setups represents the weakest link. "Our key insight is that by disrupting ASR performance, we can break the attack chain," they explain in their paper. "To this end, we propose a proactive defense framework based on universal adversarial perturbations, carefully crafted noise added to the audio signal that confuses ASR systems while leaving human comprehension intact." The researchers say they believe they're the first to propose a proactive defense against automated voice scams that's practical enough to deploy. ASRJam defends against vishing by running the EchoGuard algorithm in real time on end-user devices. The tool is invisible to attackers, making it more difficult to circumvent. EchoGuard is also universal – it works, to varying degrees, against any AI model. It is also zero-query, meaning it doesn't require sample ASR output to generate an audio perturbation capable of breaking the ASR model. The authors say that while other ASR jamming techniques have been proposed over the past few years (AdvDDoS, Kenansville, and Kenku), "none are suitable for interactive scenarios; their perturbations, though often intelligible, are perceptually harsh and impractical for interactive scenarios." ASRJam is better, they argue, because EchoGuard modifies the voice in three ways: reverberation, microphone oscillation, and transient acoustic attenuation. By altering sound reflection characteristics, simulating microphone positioning changes, and subtle sound shortening, the researchers claim their method "strikes the best balance between clarity and pleasantness," based on a survey they conducted with an unspecified number of participants. They've published a website that includes an original speech sample and copies that have been processed with EchoGuard and other algorithms for comparison. The researchers evaluated ASRJam/EchoGuard and the other techniques against three public datasets (Tedlium, SPGISpeech, and LibriSpeech) and six ASR models (DeepSpeech, Wav2Vec2, Vosk, Whisper, SpeechBrain, and IBM Watson). "Across the board, EchoGuard consistently outperforms all baseline jammers," the authors state in their paper. "Our method achieves the highest attack success rate on every ASR system tested, across all datasets, with only one minor exception: SpeechBrain (SB), where it is slightly outperformed by the others." The authors say they consider this acceptable since SpeechBrain isn't common in real-world deployments and its performance isn't great for general ASR systems. They also note that all the automatic speech recog jamming techniques tested underperform against OpenAI's Whisper model, which they suggest is better at filtering out adversarial noise because developers trained it on a particularly large set of data that included a lot of noisy samples. Nonetheless, EchoGuard does better than the other jammers against Whisper. "Importantly, while the absolute attack success rate on Whisper may seem modest (e.g., 0.14 on LibriSpeech), this still implies that 1 in 6 transcriptions is significantly corrupted, a degradation level that could be sufficient to disrupt scam conversations, especially in the context of interactive dialogue where misrecognition of key terms or intents can derail an LLM’s generation," they claim. Lead researcher Grabovski told The Register that he believes future work will improve how ASRJam and EchoGuard perform against Whisper. "ASRJam is currently a research project, but we're actively working on improvements with the goal of commercializing it in the near future," he said.

Daily Brief Summary

MISCELLANEOUS // New Defense Tech Developed to Counteract Voice-Scamming AI

Researchers from Israel and India have created ASRJam, a system leveraging EchoGuard to disrupt automatic speech recognition (ASR) systems used in voice phishing (vishing).

Vishing attacks have surged by 442% in one year, prompting the development of technologies like ASRJam that can hinder AI-driven scam calls.

ASRJam introduces subtle audio modifications that confuse ASR systems without affecting human understanding, thus breaking the scam communication loop.

While vishing involves criminals using realistic AI-generated voices, ASRJam counters by inducing errors in the scam's text conversion process, which relies on ASR technologies.

ASRJam operates in real-time on user devices, remaining hidden from attackers and applicable universally across different AI models without prior samples needed.

EchoGuard, the algorithm behind ASRJam, is designed to modify voice signals through reverberation, microphone oscillation, and transient acoustic attenuation, balancing clarity and pleasantness for the listener.

The effectiveness of ASRJam and EchoGuard was tested against multiple datasets and ASR models, showing superior results in disrupting ASR processes compared to other existing techniques.

The developers are planning further enhancements to ASRJam, with the aim of commercial rollout to effectively mitigate escalating AI-enabled voice scams.