CrowdStrike blames global IT outage on bug in checking updates
A faulty software update that crashed Windows computers around the world, upending air travel and hospital care, was caused by a bug in cybersecurity firm CrowdStrike’s quality-control system and inadequate testing systems, the company said Wednesday.
In a preliminary review, the Texas-based company said a program for validating updates missed “problematic content” in what was supposed to be a minor adjustment to CrowdStrike software previously installed on more than 8 million machines.
The bad data triggered a memory problem that “could not be gracefully handled,” causing some Windows operating systems to crash and display the light-blue, full-screen warning known as the Blue Screen of Death, according to CrowdStrike’s report.
The report failed to impress some security experts who said they were appalled to learn that CrowdStrike had not first deployed the update to a full-fledged computer running Windows and then rolled it out gradually, so that any mistake would have been detected before it disabled computers around the world.
“It is very alarming when patches and updates that are intended for systems that have true operational impact are not tested and validated before going into production,” said Steve Kelly, former senior director for cybersecurity at the White House National Security Council. “This is a wake-up call for accountability.”
A long-running annual award from expert hackers for “most epic fail,” due to be presented with other awards at the Def Con hacking conference next month, was instead announced Wednesday and given to CrowdStrike.
CrowdStrike said it would improve testing in the future, stagger distribution and give customers control over whether small updates are installed immediately.
The new disclosures came as some damage estimates for lost revenue ran over $1 billion, which would put Friday’s incident among the most costly outages, even though it didn’t lead to the permanent deletion of data or destruction of computers.
Parametrix Insurance, which provides data on availability of technology services and covers losses from cloud computing outages, predicted that Fortune 500 companies would lose more than $5 billion from the CrowdStrike failure. The company said airlines would prove to have been hit the worst and that 80% or more of the overall losses would not be covered by insurance.
The disaster renewed calls for greater accountability from software companies, which are generally held to be immune from product liability lawsuits because software is licensed instead of sold. They can still be sued for negligence, but that approach is typically limited by the licensing agreement.
CrowdStrike’s general agreement says that an outage entitles customers to a refund for the CrowdStrike subscription costs only on those days and that it is not responsible for any lost profit, revenue or business opportunity.
Such commonplace clauses “protect developers from bearing the full brunt of the damages their products might cause,” said Brian Fox, chief technology officer of Sonatype, which helps companies manage their software supply chains. “This legal framework, while designed to protect innovation, has also led to a lack of accountability.”
Biden administration officials have been pushing software vendors to voluntarily adopt safer coding practices and are requiring better service when the government buys their products. But restrictions on which misdeeds can be absolved by licensing contracts might need action in Congress.
“The feds and big buyers need to be more active in mandating [vendor] diversity and resiliency. Those things will never happen on their own,” said Amit Yoran, a former top cybersecurity official at the Department of Homeland Security who is chief executive of security firm Tenable.
CrowdStrike is widely used by government agencies and large companies, which deploy its Falcon sensors to employee machines. Like other security platforms, it attaches deeply within the Windows operating system, giving it greater power to thwart innovative hackers and greater ability to trigger a full computer failure.
After its founding in 2011, CrowdStrike became known for its involvement in post-mortem investigations into several high-profile cyberattacks, including the 2014 Sony Pictures hack by North Korea and Russian hacks of top Democratic Party officials in 2016. It attracted investment from Google and went public in 2019.
More than a hundred companies forced offline have already notified insurance companies that they may file claims for lost business, according to Marsh, a large insurance broker.
“It’s definitely being looked at as the cyber-hurricane,” said John Kerns, executive managing director at insurance broker Brown & Brown.
As cyber-insurance policies have expanded over the past decade to include additional kinds of problems, more companies have obtained protection for business interruptions, including those caused by third parties. But such coverage can be expensive, and not all have it, especially for accidental outages, insurers said.
Friday’s outage could trigger sterner conversations with software vendors about licensing terms, said Yvette Connor, risk advisory leader at CohnReznick. It might change risk management terms to say “if you cause my business to go down, there should be damages,” Connor said. “Maybe, thus, is a tipping point for more intensive ‘service-level agreement’ discussions.”
Although most businesses saw their operations return to normal over the weekend, some ripple effects continued to be felt this week, with one major airline, Delta, still canceling flights Tuesday. Other customers were down Wednesday.
CrowdStrike pledged to publish a more in-depth report on what went wrong once its full investigation is complete.