At approximately 04:23:00 UTC on June 21, 2018, utility power was interrupted to our Fremont data center. At this time, the facility’s Uninterruptible Power Supply (UPS) system was engaged. However, the UPS unit servicing a sizeable portion of Linode’s hardware deployment failed. This caused a subset of our hardware fleet which services our customer instances to lose power and reboot.
Utility power was restored at approximately 05:16:00 UTC on June 21, 2018. At that time, Linode staff worked to bring our infrastructure and the affected customer instances back online. Most Linode infrastructure and customer instances were online by approximately 08:10:00 UTC, and the incident was deemed resolved.
Our colocation provider is working with their UPS vendor to conduct a full investigation to determine what caused the failure. We will provide more detail as it becomes available. At this time, the affected UPS has been repaired and is operating normally. We have confirmation that there were subsequent power loss events early last week and the repaired UPS operated normally.
This is the second outage we’ve experienced at this facility in the last 6 months, and we do not take these downtime events lightly. To reduce the impact of power loss issues going forward, we are in the process of moving our critical network infrastructure to a new area of the data center facility now. This new area will provide fully redundant power feeds which would prevent a full outage should an issue like this recur. We anticipate completion of this phase in the next 30-60 days. Additionally, we are planning to move the remaining Linode hardware deployment to the new area to take advantage of its additional power redundancy. We do not yet have an ETA for the completion of this phase.
We do not foresee any further issues with the Fremont facility at this time. We appreciate your patience while we await the official RFO, root cause, and mitigation plan from our colocation provider.
Update 2018-07-11
Our colocation provider and their UPS vendor have completed a full investigation. Inspection of the UPS system indicated failed/burned components, and further diagnostics determined that a rectifier had failed. At this time, the faulty component has been replaced and the unit has been verified to be working properly.