On April 24, 2025 between 17:32 - 17:55 UTC, we identified a brief period of impact to network traffic connectivity in our US-LAX (Los Angeles) DataCenter. During the period of impact, customers experienced intermittent connection timeouts and errors for all services deployed in this data center.
Akamai found that one of the border routers had increased memory since the impact occurred. As a preventative measure, we rebooted this border router to return it to a stable state.
We determined that the router's kernel behavior could be reliably reproduced using a traffic pattern that overwhelms the memory buffers, leading to self-initiated reboots. A configuration fix was identified that limits the rate of subnet route probes, and started to be applied.
The network became and remained stable until a brief period of impact between 19:55 - 19:58 UTC on April 25, 2025. Following the application of a configuration change on one of the border routers, at approximately 19:47 UTC we experienced a recurrence of BGP-related instability, starting with symptoms observed at 19:54 UTC.
The resulting investigation revealed that the performance degradation was due to a misconfiguration that was allowing a high rate of failed requests to occupy resources on the CPU.
In order to fully mitigate the impact, we developed and implemented a Rate-limiting configuration fix. This process was undertaken in phases, commencing with the border routers most adversely affected, and was concluded at approximately 20:40 UTC on April 30, 2025. Following an extensive monitoring period of our systems, we verified that the issue had been resolved.
This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.