At 20:04 UTC on October 12, 2023, hosts across multiple data centers alerted for a failing component in their software stack. After initial investigation, this failure was an unexpected side effect of routine maintenance work that had just been completed.
We activated our incident response procedures at 20:12 UTC, with an action plan for remediation performed at 20:21 UTC. Improvements in behavior became evident by 20:25 UTC. Additional remediation actions commenced at 20:55 UTC and were completed by 22:41 UTC for a full resolution.
To prevent this failure from occurring again, we have identified how the component in question failed and are undertaking steps to prevent this type of maintenance from causing this failure in the future.