On 2 November 2021 at 01:48 UTC, hardware failure took down one of the edge routers in our Mumbai data center. These routers are configured in a redundant pair, and the loss of a single device should not have affected connectivity for our Mumbai customers.
However, a specific /24 subnet in Mumbai was only reachable via a single edge router due to a misconfiguration. When that router went down, there was a total loss of connectivity for customers using these IPs.
Our engineering team became aware of the impact of the offline device on 2 November at 02:43 UTC. Work began to bring the router back online, which we completed on 2 November at 20:06 UTC. At this point, connectivity to this range of IPs was restored. While the maintenance necessary to bring the device back online was time consuming, other factors also contributed to the extended time between detection and resolution, including blind spots in our monitoring and response policies. We have taken steps to address these issues and reduce the time to resolution for future incidents.
Further investigation revealed that when we made these IP addresses available to customers, only a portion of our upstream Internet providers had accepted the new prefix. The loss of one of our edge routers was therefore sufficient to take any Linodes using IP addresses within this range offline.
This was an oversight on our part and we have taken steps to ensure that provisioning new IP addresses does not impact our customers in this way going forward.
If you have any further questions regarding this incident, please reach out to Linode Support.