On Sunday April 24th at approximately 7:22am EDT -4 GMT, our network monitoring systems alerted us to packet loss reaching some destinations in our London datacenter. The loss was upstream to Linode and the issue was immediately reported to our upstream IP transit provider Telecity. Telecity responded saying the packet loss was cosmetic and was not affecting production traffic. Linode engineers pushed back on Telecity, and it was eventually identified that an upstream interface to Linode was down causing congestion on the “B” side of Linode’s network. Linode engineers were able to shift some traffic to the “A” side of the network which cleared the packet loss. Telecity was eventually able to identify and fix the down link on their network and normalize our capacity on April 29th, the down link was attributed to a DWDM issues between facilities.
Due to the duration of the outage and lack of communication coming from Telecity Linode asked that Telecity conduct an investigation of the trouble. Linode is satisfied with the findings of the investigation and confident that a prolonged event of this nature will not occur in the future.
The following is a snip from the Telecity investigation
"Following our investigation we have identified that there were failures to communicate and provide updates to Linode throughout the issue, which led to Linode having to chase for information. This has been addressed internally and the teams involved have been dealt with to ensure that this does not reoccur. The issue has also highlighted process improvements that are currently being reviewed by the management team. Moving forward all DMDW issues will be escalated by Support to the IP Support team for further investigation. Equinix sincerely apologise for any inconvenience this incident may have caused."