Linode Status

Connectivity Issues - Newark

Incident Report for Linode

Postmortem

Postmortem Summary

At approximately 12:00:00 EST on February 21st, 2019, our monitoring systems detected widespread networking issues in our Newark, NJ data center. It has been determined that a power feed to one of the redundant data center routers (Router1) was interrupted. Router1 has full chassis power redundancy and runs in a redundant configuration with Router2. When Router1 came back online and reformed its adjacency with Router2, instability with some traffic flows were detected. Engineers immediately started to troubleshoot the impacted router and isolated the problem to a corrupted neighbor table on Router1. The table was flushed and service was restored.

Timeline of Events

12:00:00 EST - A-side power on Router1 interrupted

12:05:00 EST - Linode Network Operations alerted to widespread network-related data center outage

12:06:00 EST - Incident response plan activated

12:15:00 EST - Router1 back online, reachability issues in the DC still apparent

12:25:00 EST - VPC consistency verified on router pair

12:35:00 EST - FIB consistency verified on router pair

12:50:00 EST - Router1 isolated from WAN routing, no change to impacted connectivity

13:00:00 EST - Router2 isolated from WAN routing, no change to impacted connectivity

13:25:00 EST - Router2 adjacency table flushed, no change to impacted connectivity

13:30:00 EST - Router1 adjacency table flushed

13:35:00 EST - Service restored

Further Follow Up Still Needed

It is still not clear why we experienced a prolonged outage when a router was removed from the redundant pair. These routers have sustained many reboots during upgrades and are designed to maintain functionality when one is dropped from the pair. It is also not clear why it was necessary to flush Router1's adjacency table to restore connectivity when it came back up. Linode Network Operations plans to replicate the Newark environment in our lab and work with Cisco to find the root cause of these multiple failures.

Posted Feb 22, 2019 - 16:09 UTC

Resolved

We have been able to correct the issues affecting our Newark data center. We will be closely monitoring connectivity in the Newark data center to ensure our services remain stable. A full post-mortem of the event will be available at a later date.

Posted Feb 21, 2019 - 20:31 UTC

Monitoring

We have been able to correct the issues affecting our Newark data center. We will be monitoring this issue to ensure our services remain stable.

Posted Feb 21, 2019 - 19:05 UTC

Update

Our team is still investigating this issue. We will continue to provide additional updates as the issue develops.

Posted Feb 21, 2019 - 18:36 UTC

Update

We're continuing to work to restore normal connectivity in our Newark data center, and we'll continue to provide updates here.

Posted Feb 21, 2019 - 17:56 UTC

Investigating

We are aware of connectivity issues affecting Linodes in our Newark data center and are currently investigating. We will continue to provide additional updates as this incident develops.

Posted Feb 21, 2019 - 17:18 UTC

This incident affected: Regions (US-East (Newark)).

Compute

Storage

Networking

Databases

Services

Solutions

Pricing

Library

Technical Resources

Community

Marketplace

What's New

Linode Status

Connectivity Issues - Newark

Postmortem

Postmortem Summary

Timeline of Events

Further Follow Up Still Needed

Resolved

Monitoring

Update

Update

Investigating