On November 22, 2024 at 21:43 UTC, our Compute Operations Center received alerts indicating that DHCP daemons responsible for assigning IP addresses to customer’s virtual machines were not running on ~20 compute hosts from our US-East (Newark) data center. This issue affected customers relying on DHCP for network configuration, potentially resulting in downtime for virtual machines once their DHCP lease expired. Approximately 2,500 customer virtual machines resided on the affected compute hosts during the impact window, but not all were impacted.
The investigation revealed that two services (rad-unnumbered and DHCP-unnumbered) were not restarted on these ~20 hosts following the routine security patch deployments, while the same updates were running on the remaining ~9000 hosts without any issues. For immediate mitigation, we identified and restarted the impacted daemons on the affected hosts to restore DHCP functionality. The issue was mitigated at 23:16 UTC on November 22, 2024. Akamai concluded the root cause investigation and found that the ~20 affected hosts had missed a prerequisite update, which would have ensured the DHCP daemons restarted correctly. This discrepancy rendered the hosts vulnerable during the patch deployment. We were able to run a query and determined there were 110 other hosts with this same out-of-date configuration making them susceptible to issues upon restart; the work to update the configurations for all machines is now completed.
This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.