Article Details
Scrape Timestamp (UTC): 2025-08-15 08:36:37.325
Source: https://www.theregister.com/2025/08/15/llm_chatbots_trivial_to_weaponise/
Original Article Text
Click to Toggle View
LLM chatbots trivial to weaponise for data theft, say boffins. System prompt engineering turns benign AI assistants into 'investigator' and 'detective' roles that bypass privacy guardrails. A team of boffins is warning that AI chatbots built on large language models (LLM) can be tuned into malicious agents to autonomously harvest users’ personal data, even by attackers with "minimal technical expertise”, thanks to "system prompt" customization tools from OpenAI and others. "AI chatbots are widespread in many different sectors as they can provide natural and engaging interactions," author Xiao Zhan, a postdoc in King's College London's Department of Informatics, explained in a statement issued ahead of her paper's presentation at the 34th USENIX Security Symposium this week. "We already know these models aren't good at protecting information. Our study shows that manipulated AI chatbots could pose an even bigger risk to people’s privacy - and unfortunately, it's surprisingly easy to take advantage of." One of the biggest yet most controversial success stories of the current artificial intelligence boom, large language models are trained on a vast corpus of material - typically breaking copyright law to do so - in order to turn user prompts into "tokens" and return the most statistically-likely continuation tokens in response. When things go well, these tokens form themselves into an answer-shaped object which matches reality; other times, not so much. Millions of users the world over are already putting their deepest darkest secrets into an over-engineered Eliza, there's plenty of scope for disclosure of personally identifiable information - but Zhan and colleagues have found that it's worryingly easy to "prompt engineer" an off-the-shelf chatbot into requesting increased amounts of personal data, and that they are very good at it. "Our results show that malicious CAIs [Chatbot AIs] elicit significantly more personal information than the baseline, benign CAIs," the researchers wrote in their paper, "demonstrating their effectiveness in increasing personal information disclosures from users. More participants disclose personal data - 24 percent of form vs >90 percent of malicious CAI participants; more participants respond to all individual personal data requests - 6 percent form vs >80 percent CAI participants; and personal data collected via CAIs was more in-depth with richer and more personal narratives." The experiment, which gathered data from 502 participants, relied on three popular large language models running locally, so as not to expose private information to the corporations running cloud-based models: Meta's Llama-3-8b-instruct and the considerably larger Llama-3-70b-instruct, and Mistral's Mistral-7b-instruct-v0.2, chosen to match the performance of OpenAI's proprietary GPT-4. In all three cases, the models were not retrained or otherwise modified; instead, they were given a "system prompt" prior to user interaction which was engineered to make the models request personal information, bypassing guardrails against such use by assigning "roles" including as "investigator" and "detective." Because the models could be twisted to malicious ends with, in effect, nothing more than asking nicely, the researchers found that "even individuals with minimal technical expertise [can] create, distribute, and deploy malicious CAIs," warning of "the democratisation of tools for privacy invasion." The team singled-out OpenAI's GPT Store, already flagged in 2024 as hosting apps which fail to disclose data collection, as providing an ideal platform for such abuse: a custom GPT can be pre-prompted to take on the investigator role and let loose to harvest data from an unsuspecting public. "Our prompts," the team noted, "seem to work in OpenAI." OpenAI did not offer a direct response to The Register's questions about the research, simply pointing us to usage policies which require that chatbots built on its platform may not compromise the privacy of their users. Participants in the study were most likely to disclose age, hobbies, and country, followed by gender, nationality, and job title, with a minority disclosing more sensitive information including health conditions and personal income. While some reported discomfort or distrust in chatting about such things when the models were prompted to be direct in their requests for personal data, a switch to what the team called a "reciprocal" CAI system prompt - in which the model is prompted to use a more social approach to create a supportive environment conducive to sharing - boosted the success rate considerably. "No participants reported any sense of discomfort while engaging with the R-CAI," the team noted. As for mitigation - beyond simply not spilling your guts to the statistical content blender - the researchers proposed that further research will be required to create protective mechanisms, which could include nudges to warn users about data collection or the deployment of context-aware algorithms for the detection of personal information during a chat session. "These AI chatbots are still relatively novel, which can make people less aware that there might be an ulterior motive to an interaction," co-author William Seymore, King's College London lecturer in cybersecurity, concluded in a pre-prepared statement. "Our study shows the huge gap between users' awareness of the privacy risks and how they then share information. More needs to be done to help people spot the signs that there might be more to an online conversation than first seems. Regulators and platform providers can also help by doing early audits, being more transparent, and putting tighter rules in place to stop covert data collection." The team's work was presented at the 34th USENIX Security Symposium this week, and the paper itself is available from King's College London under open-access terms. Supporting data - including prompts but excluding the chat sessions themselves in order to preserve participants' privacy - is available on OSF.
Daily Brief Summary
Researchers from King's College London have demonstrated that AI chatbots can be manipulated to harvest personal data by using system prompt engineering techniques.
The study involved 502 participants and utilized popular large language models, revealing that manipulated chatbots elicited significantly more personal information than their benign counterparts.
Models such as Meta's Llama and Mistral's Mistral were used without retraining, showing that simple prompt adjustments can bypass existing privacy guardrails.
The research warns of the ease with which individuals with minimal technical skills can deploy malicious AI chatbots, raising concerns about privacy invasion democratization.
OpenAI's GPT Store is identified as a potential platform for abuse, where custom GPTs can be pre-prompted to collect data under the guise of investigative roles.
Participants were most likely to disclose basic personal details, with some sharing sensitive information, indicating a gap in user awareness of privacy risks.
The study suggests the need for enhanced protective mechanisms and regulatory measures to mitigate privacy threats posed by AI chatbots.
The findings were presented at the 34th USENIX Security Symposium, emphasizing the importance of transparency and early audits by platform providers.