As part of the DDoS mitigation actions taken during the attacks last month, Linode's Atlanta datacenter has been using transit acquired directly from a single tier 1 provider on a temporary basis. As this is not an ideal situation for us, we have been working with our colocation provider toward a more permanent transit solution until our 200G upgrade is completed.
In working toward this more permanent solution, some remote-hands maintenance on a port that should not have been active at the time caused one of our routers to degrade for a number of minutes, flapping its BGP session with the aforementioned transit provider several times. These flaps caused route dampening within their network to go into effect for some or all of our advertised prefixes, causing major and sporadic connectivity issues.
For reference, our monitoring detected a full outage starting at 11:40am EST, and saw either full or partial outages for the next 1 hour and 33 minutes.