At approximately 01:30 UTC, on May 30, 2015, the power utility (PG&E) experienced an outage affecting our Fremont datacenter. Seven of the facility’s eight generators started correctly and provided uninterrupted power. Unfortunately, one generator experienced an electromechanical failure and failed to start. This caused an outage which affected our entire deployment in Fremont.
PG&E was in contact and gave an initial ETR for restoration of utility power of 04:30 UTC. This was later revised to 05:00 UTC and then 06:30 UTC. Utility power was actually restored at 06:05 UTC.
The maintenance vendor for the generator dispatched a technician to the datacenter and it was determined that a battery used for starting the generator failed under load. The batteries were subsequently replaced by the technician. The generators are tested monthly, and the failed generator passed all of its checks two weeks prior to the outage. It was also tested under load earlier in the month.
The UPS system and its batteries did not suffer a failure.
As soon as the outage occurred, Linode engineers verified it was indeed power related and remained on standby for over four hours waiting for power to be restored. Critical Linode infrastructure was made operational immediately after power was restored and then customer Linodes were booted.
Several servers did not survive the sudden loss of power and needed individual attention. Linode engineers worked well after the power was restored in order to repair and make these systems operational again which involved both hot and cold spare components. We were able to recover every system.
Linode apologizes for this power interruption and any inconvenience it has caused you. We sincerely appreciate your business and are committed to providing the best service possible. Our colocation provider is in the process of reevaluating their maintenance procedures and adding additional tests for this battery condition.