At 15:31 UTC on November 6, 2023, a code change was implemented into Linode's environment without a number of prerequisite changes also being implemented. This caused Linode's DNS resolvers in all regions except Washington, D.C. (us-iad3) to start blocking all UDP and TCP traffic used by DNS. The outage did not occur all at once, but happened gradually as the change propagated to each location.
This change had been successfully tested in a development environment without any issues, then was initially deployed to Washington, D.C. (us-iad3) as the first phase of a staggered rollout. No problems were observed in the us-iad3 deployment, but the change to the remaining data centers was not deployed in tandem with additional necessary changes to ensure proper behavior. Without these additional changes, the resolvers started blocking all DNS traffic. The deployment to Washington, D.C. (us-iad3) did include those changes, which is why it was not affected.
Our administrators began receiving alerts for our DNS resolvers at 15:39 UTC. After a period of initial investigation, our incident response procedures were formally initiated at 15:50 UTC. The subject matter experts developed a remediation strategy by 16:05 UTC, and the impact was alleviated in all data centers by 16:32 UTC.
This change was able to be implemented in our production environment due to an oversight in our change control process. To ensure this cannot be repeated in the future, we have identified what portion of the change control process allowed this to happen and will be exploring and implementing a number of technical and procedural measures to address it.