Our administrators were alerted to a failure in a portion of the Object Storage infrastructure in Singapore at 15:00 UTC. This resulted in the Object Storage service becoming inaccessible to customers until 15:10 UTC, at that time the cluster began to stabilize and customers would have seen intermittent 502 errors.
At 16:04 UTC our admins were again alerted to additional 502 errors being reported in the cluster, resulting from a portion of the infrastructure not recovering as expected. The admin proceeded to remove the problematic component from the production environment, recovering the cluster and returning accessibility.
As a result, we are working to implement additional monitoring to more quickly address issues with our Object Storage infrastructure. Additionally, we are working to identify procedural changes to address similar issues moving forward.