Article Details
Scrape Timestamp (UTC): 2024-10-20 09:02:32.741
Source: https://www.theregister.com/2024/10/20/python_zero_day_tool/
Original Article Text
Click to Toggle View
Open source LLM tool primed to sniff out Python zero-days. The static analyzer uses Claude AI to identify vulns and suggest exploit code. Researchers with Seattle-based Protect AI plan to release a free, open source tool that can find zero-day vulnerabilities in Python codebases with the help of Anthropic's Claude AI model. The software, called Vulnhuntr, was announced at the No Hat security conference in Italy on Saturday. "The tool does not simply paste some code from the project and ask for analysis," explained Dan McInerney, lead AI threat researcher at Protect AI, who developed the software with colleague Marcello Salvati. "It automatically finds project files that are likely to handle remote user input, Claude analyzes that for potential vulnerabilities, then for each potential vulnerability Claude is given a vulnerability-specific highly optimized prompt and enters a loop." "In this loop it intelligently requests functions/classes/variables from elsewhere in the code continually until it completes the entire call chain from user input to server output without blowing up its context window. The advantage of this over current static code analyzers is a massive reduction in false positives/negatives since it can read the entire call chain, not just little code snippets one at a time." This approach, McInerney claims, can reveal complex, multi-step vulnerabilities, as opposed to flagging functions like eval() with known security implications. "The tool was originally designed using Claude and used Claude's best practices in prompt engineering so it performs by far the best using Claude," said McInerney. "We included the option to use [OpenAI's] GPT-4 and we tested it with GPT-4o but got poorer results. Modifying the prompts to better fit GPT-4o is very straightforward and using the GPT-4o model is just a change in 1 line of code. By open sourcing it, we hope to encourage modifications such as these as new models come out." So far, McInerney says, Vulnhuntr has found more than a dozen zero-day vulnerabilities in large, open source Python projects. "All of these vulnerabilities were not previously known or reported to the project maintainers," he said. The tool presently focuses on seven types of remotely exploitable vulnerabilities. Affected projects include: Other projects with vulnerable code spotted less than 90 days ago have not been identified to give maintainers time to fix things. Ragflow, said McInerney, is the only project he's aware of that has fixed its identified bug. Vulnhuntr has some limitations. It only works on Python code at the moment and it depends on access to a Python static analyzer. As a result, the tool is more likely to generate false positives when scanning Python projects that incorporate code in other languages (e.g. TypeScript). When generating a proof-of-concept (PoC) exploit, the software generates a confidence score ranging from 1 to 10. A score of 7 means it's probably a valid vulnerability, though the PoC code may need some refinement. A score of 8 or more is highly likely to be valid. Scores of 6 or less are unlikely to be valid. The output looks something like this: Another issue is that LLMs aren't deterministic – they may provide different results for the same prompt at different times – so multiple runs may be required. Nonetheless, McInerney says that Vulnhuntr is a significant improvement over the current generation of static analyzers. There's also some cost involved since the Claude API isn't free. "My average use of it is to identify the one or two files in a project that handle remote user input and tell the tool to do analysis on just those couple files," said McInerney. "When used this way, it averages less than $0.50 of token usage. It will automatically find these network-related files as well, but it's a broad search that often sees it scanning 10-20 files instead of the 1-2 that give the best results usually. Depending on project size, scanning all the network-related files will still only cost ~$1-$3." As far as our research can tell, the release of Vulnhuntr will be the first time LLMs have actually found zero-days in the wild. McInerney says he believes Vulnhuntr's discoveries represent the first time actual zero-day vulnerabilities have been identified in public projects by an AI-assisted tool. "There are multiple papers purporting this and all are misleading because their AI did not discover zero-days, it was merely fed known vulnerable targets or code that it wasn't trained on and then said this was evidence their AI can find zero-days," he said. "As far as our research can tell, the release of Vulnhuntr will be the first time LLMs have actually found zero-days in the wild." As an example, he pointed to a paper by academic researchers whose work we've covered previously. Daniel Kang, assistant professor of computer science at the University of Illinois Urbana-Champaign, and a co-author on the cited paper and similar ones, told The Register that relying on simulated data is a common practice in security research. "It is widely accepted that simulations of real-world environments are acceptable proxies for the real world," he said. "I can link to hundreds of security papers and press releases where security tools are used in simulated environments or on past real-world vulnerabilities and no one disputes these findings. The correct thing to say is that we simulate the zero-day setting, but again, this is widely accepted as common practice." Kang's paper describes using teams of LLM agents to exploit zero-day vulnerabilities, noted that Vulnhuntr doesn't handle exploitation. He also said that in the absence of an analysis of false positives or a comparison to tools like ZAP, Metasploit, or BurpSuite, it's difficult to say how the tool compares to existing open source or proprietary alternatives. According to McInerney, the vulnerabilities identified by Vulnhuntr are very easy to exploit once identified. "The tool gives you a proof-of-concept exploit once it finds a vulnerability," he said. "It's not uncommon to need to make some kind of minor adjustment to the PoC to make it work, but it's obvious what adjustments to make after reading the analysis the LLM gives you as to why it's vulnerable." We're told Vulnhuntr will be released on GitHub, presumably through a repo associated with Protect AI. The biz is also encouraging budding bug hunters to try the tool on open source projects listed on its bug bounty website, huntr.com.
Daily Brief Summary
Researchers at Protect AI introduced a new open-source tool, Vulnhuntr, designed to find zero-day vulnerabilities in Python projects using Anthropic's Claude AI model.
Vulnhuntr reduces false positives by analyzing entire call chains rather than isolated code snippets, offering a comprehensive view of potential security flaws.
The tool has successfully identified more than a dozen previously unknown zero-day vulnerabilities in major open-source Python projects.
While primarily developed with Claude, Vulnhuntr also supports OpenAI's GPT-4 with modifications, enabling flexibility with future AI models.
It has limitations, including only working on Python code and generating false positives when projects include code in other languages.
The tool provides a confidence score for potential vulnerabilities, guiding users on the likelihood of actual security risks.
Costs for using Claude API are minimal, averaging below $3 per full project scan, depending on the number of files analyzed.
Vulnhuntr's findings are significant as they mark one of the first instances of zero-days found by an AI tool in real-world projects.