Emergency Node Reboot: NZ-AKL2-6O20W9

Incident Report for SiteHost

Postmortem

Summary
This morning at 8:30 AM, one of our monitoring systems alerted us to a problem with the storage service on a node. This caused disruption for the virtual machines (VMs) running on that hardware.

Timeline

8:30 AM – Monitoring detected the issue.
8:45 AM – Our team determined a reboot was the fastest way to restore service.
8:55 AM – The node was rebooted and VMs began coming back online.
11:00 AM – Services confirmed stable; incident closed.

Cause
The disruption was traced to a faulty hard drive in the storage system.

Resolution
The faulty drive was removed and replaced. The storage service has remained stable since, and no further issues are expected.

Next Steps
We are improving monitoring and alerting to catch early signs of drive degradation sooner, helping us prevent similar issues in the future.

Current Status
All services are fully operational.

Posted Sep 25, 2025 - 15:33 NZST

Resolved

The node and guests have remained stable and we will continue to monitor for further issues. If you are still having issues with a guest on this node please get in touch at support@sitehost.co.nz.

Posted Sep 25, 2025 - 10:55 NZST

Monitoring

We have completed the emergency maintenance and are currently monitoring the situation. All services hosted on the node should be restored.

Posted Sep 25, 2025 - 08:57 NZST

Identified

We have experienced some hardware node instability issues with one of the nodes (NZ-AKL2-6O20W9). As a result, we need to perform emergency maintenance to reduce the likelihood of future issues. This will involve rebooting the servers on this node.

Posted Sep 25, 2025 - 08:30 NZST