On March 3, 2026, at approximately 20:00 UTC,, a cron job responsible for email notifications from Cloud Manager began failing due to timeouts. As the job exceeded the request timeout interval, it repeatedly failed and restarted, causing the process to resume from the beginning of its queue each time. This resulted in customers receiving multiple duplicate email notifications and some delay in those notifications.
This issue only affected event-based notification emails related to host jobs such as VM shutdowns, startups, deletions, or deployments. Other notification emails set up by the customer, such as CPU usage thresholds, and the host jobs themselves remained unaffected. To address the issue, the cron job was temporarily stopped, associated services were restarted, and a processing limit was introduced to control the number of records handled per run. These actions stabilized the system, and the incident was considered mitigated at approximately 19:00 UTC on March 4, 2026.
After mitigation of the root cause, a backlog of approximately 2 million pending events had to be processed, which caused further delays in notification delivery over the following days for some customers. The system worked through this backlog and gradually returned to normal processing. As of March 10, 2026, the backlog has been cleared and notification emails are being delivered as expected.
Since this incident, we have already tested and implemented additional cron job optimizations to ensure consistent performance. We are committed to making continuous improvements to make our systems better and prevent recurrence.
This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.