Event Overview
Background
CrowdStrike, based in Austin, Texas, is a global cybersecurity firm founded in 2011, and according to the company’s
website has almost a dozen security and IT tools, is involved with about 300 of the Fortune 500 companies, six out
of the top 10 health care providers, eight of the top 10 financial services firm, and eight of the top 10 technology
firms. CrowdStrike has an advanced cloud-native platform for protecting endpoints, cloud workloads, identities and
data. The CrowdStrike Falcon platform1 leverages real-time indicators of attack, threat intelligence on
evolving
adversary tradecraft and enriched telemetry from across the enterprise to deliver hyper-accurate detections,
automated protection and remediation, elite threat hunting and prioritized observability of vulnerabilities — all
through a single, lightweight agent.
What Happened
On July 19, 2024, at 04:09 UTC CrowdStrike released a sensor configuration2 update for Falcon to Windows
systems. The configuration update triggered a logic error resulting in a system crash and blue screen (also known as the BSOD or
blue screen of death) on impacted Microsoft Windows systems. By July 19, 2024, at 05:27 UTC the sensor configuration
update was remediated. CrowdStrike customers who were online between the times referenced and using Falcon sensor
for Windows version 7.11 could be impacted. Systems that had automatically downloaded the updated configuration
between the times referenced were susceptible to a system crash. The sensor configuration update does not affect
Linux or MacOS.
Per CrowdStrike’s blog on the outage, configuration files are referred to as “Channel Files” and are part of the
behavioral protection mechanisms used by Falcon sensor. Updates to the Channel Files are routine and occur several
times per day in response to novel tactics, techniques and procedures discovered by CrowdStrike. Each channel is
assigned a number as a unique identifier, “c-00000291- ”, it has a .sys extension but is not a kernel driver.
Channel 2913 controls how Falcon evaluates named pipe execution on Windows systems. Named pipes are used
for normal, inter-process or intersystem communication in Windows. The update at 04:09 UTC was to target newly observed,
malicious named pipes being used by common C2 (Command and Control) frameworks.
The configuration update caused a logic error that resulted in the operating system crashing. The Channel file
"C-00000291*.sys" with timestamp of 04:09 UTC is the problematic version and the Channel file "C-00000291*.sys" with
timestamp of 05:27 UTC or later is the reverted (good) version.
Impact
Microsoft has estimated 8.5 million Windows devices have been affected.4 The broad economic and societal
impacts reflect the use of CrowdStrike by enterprises that run many critical services.
The air travel industry had more than 3,000 flights5 cancelled and a reported 23,900 flights
delayed6 due to ticketing, operations, other services, at airports. The healthcare7 industry was impacted by the outage. In the U.S.
some emergency call centers in were impacted by the outage. Healthcare providers had services disrupted, such as elective
hospital procedures, procedures that required anesthesia, and medical visits were cancelled or paused. In the UK,
the number used to call for emergency ambulances wasn’t impacted but across health provider offices there were
problems with the appointment and patient record system used across the health service.
The financial industry was also impacted. In the U.S., some banks reported8 login issues, and trades on
the stock exchange were delayed because bankers couldn’t access their work systems. In the UK news updates about the exchange
couldn’t be published but the exchange itself was operational. In South America customers of a bank9 may
have had issues accessing digital services due to the outage and services being unstable during that time.
Aon’s Threat Intelligence Analysis
This incident highlights how interconnected and dependent companies across the globe are and how an error (in this
case) can impact business operations. Vendors should have processes and procedures in place when updating software.
This should encompass how the update is developed, tested on development systems, monitored to watch for any adverse
effects of the update, and then pushed out to production systems. How that process works for a security vendor
pushing updates multiple times per day without any issues will be something to watch going forward. Companies should
assess any third and fourth-party exposure they have to this incident. Even if your organization was not impacted or
has been remediated, there may be external parties your organization relies on which remain effected. Understanding
those relationships is important. Companies should have a proactive plan for gaining visibility across the supply
chain in addition to considering scenarios that may impact operational resilience of the supply chain.
It is strongly recommended that insureds follow the guidelines from CrowdStrike and Microsoft to remediate system crashes or system unavailability due to the outage.
(Re)Insurance Implications
Key Implications for Cyber (Re)Insurance
- This is reported to be a non-malicious event, meaning that “system failure” coverage, where offered, within
cyber (re)insurance policies is the relevant loss trigger.
- Business interruption (loss of income and extra expenses incurred), where offered due to system failure, is
expected to be the most directly affected head of damage, subject to applicable waiting periods.
- Dependent business interruption, data restoration, incident response and voluntary shutdown costs may also be
applicable and contribute to reinsured losses.
- At the individual risk level, Aon expects this event to trigger greater attention to system failure coverage
grants and business interruption waiting periods.
- At the portfolio level, Aon sees this event as an opportunity for the market to react by improving granularity
on codifying policy information important for understanding portfolio accumulation risks stemming from certain
coverage grants, to allow more nuanced event loss estimation and accumulation scenario analysis.
- The industry has developed specific (re)insurance and bond products which this event will test, both from an
event definition and loss quantum perspective.
Insurance Coverage Focus
Portfolio Accumulation
This is likely to be the most important cyber accumulation loss event since NotPetya in 2017. However, the overall
loss quantum is currently uncertain and will primarily depend on:
- The prevalence of coverage for system failure, which varies across the market.
- The duration until successful manual remediation at each affected insured, versus the applicable waiting periods
on their cyber policies.
This event brings into focus the need for greater transparency of system failure coverage grants, waiting periods and
in general a more granular approach to tracking coverage items relevant for monitoring aggregations at portfolio
level. For example, distinguishing between coverage, limits and waiting periods for each of business interruption as
follows:
Business interruption coverage grant subsets:
- Security failure – own IT
- System failure – own IT
- Dependent security failure – IT providers (named vs unnamed)
- Dependent system failure – IT providers (named vs unnamed)
- Dependent security failure – non-IT providers (named vs unnamed)
- Dependent system failure – non-IT providers (named vs unnamed)
Event Definitions
Specific coverage for events with widespread impact such as this is a developing area of the cyber market, featuring
in a subset of original policies, reinsurance treaties and catastrophe bonds. This event will bring into focus:
- The wording aspect of these products/covers (e.g., “are non-malicious events covered?”)
- The threshold aspect: Does the event “qualify” as an event of required magnitude and will the attachment points
of cover be reached?