Linode Status

Current Status

Service Issue - Intermittent connection drops on LKE pod to pod traffic

Incident Report for Linode

Postmortem

As part of the regular troubleshooting for a customer for an issue on the LKE-E side, we became aware of an intermittent issue causing pod to pod connection timeouts on LKE clusters across all data centers. The investigation at the time indicated "noisy network neighbors" on the hosts that were leading to timeouts. Additional investigation indicated that this issue has been existing since approximately January 20th, 2025. 

Our LKE engineering team started testing on standard LKE tier server sets and they were able to replicate the issue for 3 hours in the Los Angeles data center.

Akamai ultimately discovered two different issues which led to the behavior observed. We tracked back most of the occurrences for all server sets running Dedicated Linode plans to problems with the underlying host, and in most cases, it was related to memory pressure and the running guests all had their network affected. We correlated the customer’s reports to their decision to change all premium nodepools to dedicated nodepools at the beginning of the year.

The networking problems we noticed in premium were in fact getting drowned out by the noisy dedicated server sets. Once we isolated only premium nodepools, we were able to correlate the customers' reports to a known issue we had in our envoy proxy configuration. 

In order to mitigate the issue, we released a patch with a fix.

Akamai will schedule a meeting to outline lessons learned and next steps to ensure similar incidents do not happen in the future.

This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.

Posted Jun 05, 2025 - 23:36 UTC

Resolved

We haven't observed any additional issues with the Linode Kubernetes Engine (LKE), and will now consider this incident resolved. If you continue to experience issues, please contact us at 855-454-6633 (+1-609-380-7100 Intl.), or send an email to support@linode.com for assistance.
Posted Apr 15, 2025 - 00:35 UTC

Monitoring

At this time we have been able to correct the issue affecting the Linode Kubernetes Engine (LKE). We will be monitoring this to ensure that the service remains stable. If you are still experiencing issues and unable to open a Support ticket, please call us at 855-454-6633 (+1-609-380-7100 Intl.), or send an email to support@linode.com.
Posted Apr 14, 2025 - 22:24 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 12, 2025 - 00:35 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 11, 2025 - 00:10 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 09, 2025 - 23:45 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 09, 2025 - 00:42 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 07, 2025 - 22:56 UTC

Update

The investigation continues, at the time we have been able to confirm the issue is causing intermittent instances of 1-2 mins connection timeouts between pod to pod traffic. These timeouts have been identified in only 2 of our Data Centers with only few ocurrences and we continue to work to identify the cause. Due to the intermittent nature of the issue the investigation will take additional time and will provide additional updates as progress is made. Should you notice issues with symptoms that align to this, please contact us at 855-454-6633 (+1-609-380-7100 Intl.), or send an email open a Support ticket for assistance.
Posted Apr 04, 2025 - 20:34 UTC

Update

We are continuing to investigate this issue and will provide additional updates as progress is made.
Posted Apr 04, 2025 - 19:34 UTC

Update

We are continuing to investigate this issue.
Posted Apr 04, 2025 - 17:14 UTC

Investigating

Our team is investigating an issue affecting the Linode Kubernetes Engine (LKE). We will share additional updates as we have more information.
Posted Apr 04, 2025 - 16:42 UTC
This incident affected: Linode Manager and API.