Summary
This morning at 8:30 AM, one of our monitoring systems alerted us to a problem with the storage service on a node. This caused disruption for the virtual machines (VMs) running on that hardware.
Timeline
Cause
The disruption was traced to a faulty hard drive in the storage system.
Resolution
The faulty drive was removed and replaced. The storage service has remained stable since, and no further issues are expected.
Next Steps
We are improving monitoring and alerting to catch early signs of drive degradation sooner, helping us prevent similar issues in the future.
Current Status
All services are fully operational.