Skip to main content

CrowdStrike Outage

· 2 min read

On Friday July 19th, 2024 a new definition file was pushed to over 8.5 Million Windows devices simultaneously, causing them all to BSOD.

Crowdstrike

What happens when Banks, Airlines, Airports, TV networks, and Healthcare systems all crash at the same time?

This is what happened on the morning of July 19th.

To explain how this happened we first have to understand how Crowdstrike works when it's installed on an endpoint (customer computer).

When a program is run on a Windows computer it runs in 1 of 2 modes, Ring 0 (Kernel mode) and Ring 1 (User mode). Programs running in Ring 0 have the most privileges and are usually device drivers. Programs like Firefox and Blender are Ring 1 and have less privileges. When a Userland program crashes, it just crashes and you get an alert. But when a Kernel driver crashes you get the blue screen of death. As you can probably guess, CrowdStrike runs in Kernel mode.

So how did it actually fail?

Crowdstrike runs a process called CSAgent.sys and it works by reading files in a specific directory then using the data in those files as definitions for detecting anomalies. It was a deployment of a malformed definition file to this directory that caused the CSAgent to fail, not an update to the agent itself. For a good breakdown of how it works you can see this thread on X.

What's the fix?

The easy fix is to boot into Safe Mode which will prevent CSAgent.sys from running, then going to %WINDIR%\System32\drivers\CrowdStrike and removing all files that match C-00000291*.sys and rebooting.

Microsoft has also released their own tool to recover from this, here