At 18:40 UTC on 1/25/21, the Linode Network Operations team began receiving alerts and reports from the Frankfurt data center indicating a possible network outage. After some investigation, we were able to determine a core switch was missing active sections of its interface configuration which connects this switch to its downstream hosts. Once identified, the missing configuration was restored from a backup which repaired network connectivity to affected customers. The incident was confirmed to be fully resolved for all customers at 20:46 UTC on 1/25/21.
The post-incident investigation revealed a vendor bug in this core switch that caused corruption while writing the device configuration. A routine automated job accepted a configuration rollback to an incomplete snapshot created by this vendor bug. This caused the device to enter an inconsistent state which disrupted networking services in Frankfurt.
With this bug identified, the Network Operations team has made sure no rollbacks are performed on this switch, and will be conducting a network maintenance to address this bug.
NOTE: This incident, while close in time to the previous outage in Frankfurt, has proven to be unrelated.