Article Details
Scrape Timestamp (UTC): 2024-07-24 05:21:21.709
Source: https://www.theregister.com/2024/07/24/crowdstrike_preliminary_incident_report/
Original Article Text
Click to Toggle View
CrowdStrike blames a test software bug for that giant global mess it made. Something called 'Content Validator' did not validate the content, and the rest is history. CrowdStrike has blamed a bug in its own test software for the mass-crash-event it caused last week. A Wednesday update to its remediation guide added a Preliminary Post Incident Review (PIR) that offers the vendor's view of how it brought down 8.5 million Windows boxes. The explanation opens by detailing that CrowdStrike's Falcon Sensor ships with "Sensor Content" that defines its capabilities. The software is updated with "Rapid Response Content" that allows it to detect and collect info on new threats. Sensor Content relies on "Template Types" – code that includes pre-defined fields for threat detection engineers to leverage in Rapid Response Content. Rapid Response Content is delivered as "Template Instances," which CrowdStrike describes as "instantiations of a given Template Type." Each Template Instance maps to specific behaviors for the sensor software to observe, detect or prevent. In February 2024, CrowdStrike introduced a new "InterProcessCommunication (IPC) Template Type" that the vendor designed to detect "novel attack techniques that abuse Named Pipes." The IPC Template Type passed testing on March 5, so a Template Instance was released to use it. Three more IPC Template Instances were deployed between April 8 and April 24. All ran without crashing 8.5 million Windows machines – although, as we reported earlier this week, Linux machines had problems with CrowdStrike in April. On July 19, CrowdStrike introduced two more IPC Template Instances. One included "problematic content data" – but made it into production anyway, because of what CrowdStrike described as "a bug in the Content Validator." The post doesn't detail Content Validator's role – we'll assume it's supposed to do what the name suggests. Whatever the Validator does or is supposed to do, it did not prevent the release of the July 19 Template Instance, despite it being a dud. That happened because CrowdStrike assumed that tests that passed the IPC Template Type delivered in March, and subsequent related IPC Template Instances, meant the July 19 release would be OK. History tells us that was a very bad assumption. It "resulted in an out-of-bounds memory read triggering an exception." "This unexpected exception could not be gracefully handled, resulting in a Windows operating system crash." On around 8.5 million machines. The incident report includes promises to test future Rapid Response Content more rigorously, stagger releases, offer users more control over when to deploy it, and provide release notes. You read that right: release notes. Be still your beating heart.txt. The report also includes a pledge to release a full root cause analysis, once CrowdStrike has finished its investigation. Take all the time you want: some of us are still busy rebuilding machines you broke.
Daily Brief Summary
A critical flaw was found in CrowdStrike's test software, leading to a significant outage affecting 8.5 million Windows machines globally.
The problem arose from a bug in the "Content Validator" software component, which failed to detect problematic data within a newly released template.
The incident occurred after the implementation of an "InterProcessCommunication (IPC) Template Type" designed to identify attacks involving Named Pipes.
Despite successful tests of similar updates earlier in the year, the July 19 template release contained errors, overlooked due to the Content Validator malfunction, resulting in system crashes.
CrowdStrike has acknowledged the fault and has committed to enhancing testing protocols, staggering update releases, and giving customers more control over update deployments.
Promises include more rigorous testing, user control over deployments, detailed release notes, and a forthcoming comprehensive root cause analysis once the internal investigation concludes.