Windows VPS Issues - North Shore
Incident Report for SiteHost
Postmortem

At approximately 08:00 on Friday 23rd of June one of our Windows VPS nodes (NZ-AKL2-V5L1DJ) suffered from multiple drive failures, bringing some customer servers offline.

Our engineers were notified immediately of the outage and attempted to restore service by repairing the RAID array on the node, however this ultimately failed and the decision was made to restore on fresh drives using our most recent backup.

This is when we discovered that the drives had started failing earlier that morning - during our standard backup window. Ultimately this resulted in backups that were either incomplete or corrupt for roughly half of the impacted customers. As such we had to restore their backups from the previous night, causing up to 24 hours of data loss in some cases.

By 10:05 we had restored the first batch of customer servers and all customers were back online by 14:07.

——

Incidents such as this are a reality of what we do. We are picky about the hardware that we use in our nodes but parts fail – how we recover is the true test.

In this situation we want to do better. We lost more time than we would have liked simply waiting on data to transfer from our backup servers to the new drives.

As such we would like to improve this as part of a larger backup project that we are currently undertaking.

Finally we would like to say sorry to the impacted customers, especially to those whose servers had to be recovered from an older backup. You were all extremely understanding during the outage and for that we’re thankful.

As always if you have any questions regarding this outage, or want to discuss disaster recovery options please get in touch and we will be happy to help.

Quintin Russ

Posted Jun 27, 2017 - 16:33 NZST

Resolved
The affected hardware node is now stable and service should be restored.
Posted Jun 23, 2017 - 18:17 NZST
Monitoring
We now believe all servers are back online and running correctly. Please note that we had to restore servers from the most recent backup available so please get in contact if you're experiencing any problems.
Posted Jun 23, 2017 - 14:07 NZST
Update
We are continuing to restore customer servers impacted by this node failure and hope to have this issue resolved later this afternoon.
Posted Jun 23, 2017 - 12:34 NZST
Identified
Engineers have identified the issues on the node and are working to resolve them. Some customers will be back online already and we are currently working to restore the remainder.
Posted Jun 23, 2017 - 10:10 NZST
Investigating
We are currently experiencing issues with one of our Windows VPS nodes (NZ-AKL2-V5L1DJ) based in our North Shore facility, resulting in some servers being offline. Our apologies for the inconvenience here - we are actively working to resolve this as quickly as possible and will advise when we know more.
Posted Jun 23, 2017 - 08:39 NZST