Article Details

Scrape Timestamp (UTC): 2023-10-09 12:26:13.414

Source: https://thehackernews.com/2023/10/i-had-dream-and-generative-ai-jailbreaks.html

Original Article Text

Click to Toggle View

"I Had a Dream" and Generative AI Jailbreaks. "Of course, here's an example of simple code in the Python programming language that can be associated with the keywords "MyHotKeyHandler," "Keylogger," and "macOS," this is a message from ChatGPT followed by a piece of malicious code and a brief remark not to use it for illegal purposes. Initially published by Moonlock Lab, the screenshots of ChatGPT writing code for a keylogger malware is yet another example of trivial ways to hack large language models and exploit them against their policy of use. In the case of Moonlock Lab, their malware research engineer told ChatGPT about a dream where an attacker was writing code. In the dream, he could only see the three words: "MyHotKeyHandler," "Keylogger," and "macOS." The engineer asked ChatGPT to completely recreate the malicious code and help him stop the attack. After a brief conversation, the AI finally provided the answer. "At times, the code generated isn't functional — at least the code generated by ChatGPT 3.5 I was using," Moonlock engineer wrote. "ChatGPT can also be used to generate a new code similar to the source code with the same functionality, meaning it can help malicious actors create polymorphic malware." AI jailbreaks and prompt engineering The case with the dream is just one of many jailbreaks actively used to bypass content filters of the generative AI. Even though each LLM introduces moderation tools that limit their misuse, carefully crafted reprompts can help hack the model not with strings of code but with the power of words. Demonstrating the widespread issue of malicious prompt engineering, cybersecurity researchers have even developed a 'Universal LLM Jailbreak,' which can bypass restrictions of ChatGPT, Google Bard, Microsoft Bing, and Anthropic Claude altogether. The jailbreak prompts major AI systems to play a game as Tom and Jerry and manipulates chatbots to give instructions on meth production and hotwiring a car. The accessibility of large language models and their ability to change behavior have significantly lowered the threshold for skilled hacking, albeit unconventional. Most popular AI security overrides indeed include a lot of role-playing. Even ordinary internet users, let alone hackers, constantly boast online about new characters with extensive backstories, prompting LLMs to break free from societal restrictions and go rogue with their answers. From Niccolo Machiavelli to your deceased grandma, generative AI eagerly takes on different roles and can ignore the original instructions of its creators. Developers cannot predict all kinds of prompts that people might use, leaving loopholes for AI to reveal dangerous information about recipes for napalm-making, write successful phishing emails, or give away free license keys for Windows 11. Indirect prompt injections Prompting public AI technology to ignore the original instructions is a rising concern for the industry. The method is known as a prompt injection, where users instruct the AI to work in an unexpected fashion. Some use it to reveal that Bing Chat internal codename is Sydney. Others plant malicious prompts to gain illicit access to the host of the LLM. Malicious prompting can also be found on websites that are accessible to language models to crawl. There are known cases of generative AI following the prompts planted on websites in white or zero-size font, making them invisible to users. If the infected website is open in a browser tab, a chatbot reads and executes the concealed prompt to exfiltrate personal information, blurring the line between data processing and following user instructions. Prompt injections are dangerous because they are so passive. Attackers don't have to take absolute control to change the behavior of the AI model. It's simply a regular text on a page that reprograms the AI without its knowledge. And AI content filters are only so helpful when a chatbot knows what it's doing at the moment. With more apps and companies integrating LLMs into their systems, the risk of falling victim to indirect prompt injections is growing exponentially. Even though major AI developers and researchers are studying the issue and adding new restrictions, malicious prompts remain very difficult to identify. Is there a fix? Due to the nature of large language models, prompt engineering and prompt injections are inherent problems of generative AI. Searching for the cure, major developers update their tech regularly but tend not to actively engage into discussion of specific loopholes or flaws that become public knowledge. Fortunately, at the same time, with threat actors that exploit LLM security vulnerabilities to scam users, cybersecurity pros are looking for tools to explore and prevent these attacks. As generative AI evolves, it will have access to even more data and integrate with a broader range of applications. To prevent risks of indirect prompt injection, organizations that use LLMs will need to prioritize trust boundaries and implement a series of security guardrails. These guardrails should provide the LLM with the minimum access to data necessary and limit its ability to make required changes.

Daily Brief Summary

CYBERCRIME // Generative AI Contributing to Security Vulnerabilities through Prompt Engineering and Injection Methods

Generative AI, such as OpenAI's ChatGPT, are increasingly being exploited by hackers, who can trick these AI models into generating malicious code.

Cybersecurity researchers highlight the ability of hackers to craft specific prompts that exploit the AI model's "learning" or "generative" capabilities for malicious purposes.

Cases have been highlighted where AI models can be manipulated into generating code for a keylogger malware or creating polymorphic malware that can evade detection.

A new issue named 'Universal LLM Jailbreak' is gaining attention, which is a method to bypass restrictions of ChatGPT, Google Bard, Microsoft Bing, and Anthropic Claude, manipulating AI systems to carry out unauthorized activities like meth production and hot-wiring cars.

The method of 'prompt injections' is also becoming concerning, where users manipulate the AI to behave in an unforeseen manner that can have potentially harmful results, such as revealing sensitive information like the Bing Chat's internal codename.

The emergence of these practices calls for stricter regulation and built-in security measures around Generative AI to avoid them being misused. Measures suggested include implementing security guardrails and limiting the AI's access to data, therefore, reducing the potential risk of exploitation.