On 7 May 2024, between 12:44 UTC and 14:38 UTC, customers using Compute services in the London (Lon1) region experienced significant connectivity issues due to an internal router migration that inadvertently introduced a routing loop. The initial mitigation, which involved disabling inbound advertisements on the Lon1 IEN private peer router at 14:38 UTC, partially alleviated the problem. However, intermittent connectivity persisted until 17:19 UTC, when all problematic BGP sessions between London data centers were terminated to fully resolve the routing loop. Monitoring continued until 20:54 UTC to ensure the effectiveness of these steps. This issue impacted various Compute services, including Linode VMs, NodeBalancers, and Cloud Firewalls.
To prevent similar issues in the future, we have reinforced our alerting and monitoring processes and are reviewing our migration procedures. A stricter pre-migration assessment will be implemented to minimize disruptions during future infrastructure changes.
This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.