SentinelOne said on Saturday that a global service disruption a few days earlier was the result of a software flaw in the company’s infrastructure control system that led to a widespread loss of network connectivity.
In a root-cause analysis report, the company said Thursday’s major connectivity loss — which crippled its services worldwide — was not the result of a cyberattack. Instead, critical network routes and DNS resolver rules were deleted due to a software flaw in an automated process.
SentinelOne is in the process of transitioning its production system to a new cloud-based architecture built on the principles of infrastructure as code. The company said a control system that will soon be deprecated was triggered by the creation of a new account. A software flaw in that control system’s configuration comparison function misidentified discrepancies and applied what it believed to be the correct configuration state, overwriting prior network settings.
The Mountain View, Calif.-based company said customer endpoints continued to operate but security teams were unable to access management consoles and other related services. This loss of access “significantly impacted their ability to manage their security operations and access important data,” the company said.
SentinelOne assured enterprise customers that their endpoints were protected and that no SentinelOne security data was lost during the outage.
“A core design principle of the SentinelOne architecture is to ensure protection and prevention capabilities continue uninterrupted without constant cloud connectivity or human dependency for detection and response — even in the case of service interruptions, of any kind, including events like this one,” the company said.
The incident did not impact SentinelOne’s federal customers, including those using GovCloud, according to the company, which said that it nonetheless alerted federal customers for situational-awareness and transparency purposes.
The company provided a detailed timeline of the outage, showing that it began at 9:37 a.m. ET and was declared resolved by 4:05 p.m. ET.
Analysts said the outage raised immediate concerns about transparency on the status of their respective security environments.
“Vendors must communicate quickly and transparently with customers during outages so they can appropriately prepare, plan, and communicate with executives about it,” Allie Mellen, principal analyst for security and risk at Forrester, told Cybersecurity Dive via email. “Further, it’s crucial that vendors have some out-of-band communication methods (for example, an independent, public status page) for updates on outages like these.”
The outage comes at a time when software integrity and business continuity have become ongoing concerns in the cybersecurity and broader software industry. A flawed software update from CrowdStrike, a major SentinelOne competitor, crippled more than 8.5 million Microsoft Windows computers.
In a July 2024 conference call, SentinelOne boasted about how it was fielding new customer inquiries in the aftermath of the CrowdStrike outage. CEO Tomer Weingarten said the concerns raised by that outage would “play out for years” as companies addressed the liabilities and risk issues linked to the incident.