At approximately 19:35:00 UTC on July 18, 2018, utility power was interrupted to our Newark data center and critical load was immediately transferred to generator. At this time, the Uninterruptible Power Supply (UPS) system servicing Linode’s deployment alerted our colocation provider to a possible failure. UPS technicians were dispatched on site to diagnose the issue. It was determined that one of the UPS units in the N+1 system was damaged and taken offline for repair. As the UPS system was able to run in an N configuration, our colocation provider then decided to transfer back to utility power once it was restored.
At 23:04:00 UTC data center power was switched from generator power to utility power but the remaining two system UPS units failed to take the load and switched to bypass. This caused the power system servicing Linode’s deployment to lose power for approximately three minutes.
Power was fully restored at approximately 23:07:00 UTC. At this time Linode staff worked to bring our infrastructure and affected customer instances back online. Most Linode infrastructure and customer instances were online by approximately 02:34:00 UTC on July 19, 2018, and the incident was deemed resolved.
Our colocation provider has conducted a full investigation and determined that a power surge during the utility power failure damaged the inverter on one of the UPS units in the N+1 set servicing Linode’s deployment. This unit was taken offline and the UPS system was operating in an N state. During the switch back from generator power to utility power, the remaining UPS units were unable to handle the load due to a control board malfunction and went into bypass mode. This caused a 3-minute power interruption.
At this time, the malfunctioning control boards have been checked and verified operational, and the UPS system is currently functional in an N state. A maintenance window has been scheduled to replace the damaged inverter on the failed UPS unit as well as the control boards on all UPS units in the set. This will restore the UPS system to N+1 redundancy and eliminate the potential for further control board malfunctions.
To provide additional protection, our colocation provider has performed an audit of all power circuits servicing our deployment. Over the next quarter we will be transitioning critical hardware to power configurations with increased redundancy.
We do not foresee any further issues with the Newark facility at this time. Thank you for your patience and understanding. We apologize for any inconvenience this interruption has caused.