On February 2, 2022, we began receiving reports of intermittent interruptions in IPv6 connectivity in our Frankfurt data center. Our engineers narrowed down the root cause to insufficient resource allocation for IPv6 routing on our internal LAN equipment. At 19:45 UTC on February 2, we began work to reconfigure the resource allocation parameters on this equipment.
Routing within our Frankfurt LAN is primarily handled by a pair of redundant routers. We applied the necessary configuration changes and rebooted router A without disruption. Problems arose following the reboot of router B, which came back online at 21:01 UTC.
Router B was explicitly configured to handle some control plane responsibilities for the redundant pair. However, router A had temporarily taken over these responsibilities during router B's reboot. At this point the two split-brained routers began making unpredictable forwarding decisions that ultimately led to a complete loss of connectivity for all Linodes served by these devices.
Our network engineers immediately detected the routers' split-brained condition and began working to restore both routers to their correct roles. At 21:48, connectivity for affected Linodes was fully restored.
We know that our customers depend on our status announcements for transparent, up-to-date information about incidents, and our response time for this one missed the mark. We've made some improvements in our process to ensure that we have more effective, coordinated responses going forward. We've also revised our methods for rebooting our LAN routers to ensure that this sort of role conflict does not manifest during maintenance work in the future.
If you have any further questions for us, please open a Support ticket for assistance.