On April 1, 2025, users of the Linode Kubernetes Engine (LKE) began experiencing issues connecting to their clusters. While internal cluster services continued to function, the LKE Dashboard and external access were impacted due to a DNS resolution problem.
The issue was traced back to an internal dependency within our LKE DNS system. An earlier update to a database caused the DNS service to hang. This led to the DNS servers associated with the LKE Dashboard becoming unresponsive.
Once the root cause was identified, we restarted the affected DNS services. This action restored full functionality, and access to LKE services returned to normal. We monitored the system to ensure stability and officially resolved the incident on April 3, 2025.
To prevent this issue from happening again, we are improving how our DNS services recover from transient connectivity failures to their dependencies. We are also enhancing our monitoring to detect similar problems sooner and updating our deployment process to better coordinate changes between dependent systems. We sincerely apologize for the disruption and thank you for your patience during this incident.
This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.