Starting at 13:56 UTC on March 4th, 2023, a scheduled power maintenance in the Dallas data center resulted in a loss of power to networking equipment in the data center that controls a segment of our internal networking. This segment specifically controls communications between infrastructure in the data center. At 16:26 UTC power to the networking equipment was restored and the internal networking was able to resume.
During this period of time, customers would have been unable to interact with their Linodes via the Cloud Manager or API. Furthermore, the Linodes would have been unable to communicate with Linode services on that network such as Block Storage. Customers with Block Storage volumes attached may have been unable to boot.
Additionally, three cabinets of hosts in the data center were cabled on non-redundant power and as a result, 30 hosts lost power during the maintenance. By 17:00 UTC, the cabinets were placed on redundant power by the remote hands in the data center, and the hosts were powered back on. One host initially did not power back up fully, and it was booted manually at 18:15 UTC.
This incident was initially marked as resolved at 19:10 UTC. However, after receiving several reports from customers, the Support team was able to identify lingering effects to the Block Storage service. After power was restored, the Block Storage service took additional time to complete its recovery tasks. Access to the Block Storage service was restored at 22:12 UTC and customers impacted would have been able to boot their Linodes.
The root cause of this incident was twofold. First, certain networking equipment in the data center was on single power due to the equipment itself only having one power port, which resulted in the loss of internal networking. The second issue was three cabinets that were improperly cabled on installation by remote hands at the data center.
To address the single-power equipment, we are in the process of installing ATS equipment to place the networking on redundant power. In addition, we have reached out to the data center to discuss this past maintenance as well as upcoming maintenance events to ensure this will not reoccur. We will have staff on-site during a power maintenance in the data center this Saturday, March 11th, 2023 from 13:00 UTC until 20:00 UTC to oversee the maintenance and ensure there will not be issues.