Outage - Windows Node (Auckland, North Shore)
Incident Report for SiteHost
Postmortem

On Wednesday 26th September at approximately 9:30am we experienced performance issues on one of our North Shore hardware nodes - NZ-AKL2-9QQQZ9. The node appeared to be running out of memory and as a result the virtual servers on the node went offline. Our monitoring notified us of this immediately and we began to investigate, restarting the node and bringing the virtual servers back up. Unfortunately we experienced the same problem as before so began again, bringing up the virtual servers one by one to try and identify which server was causing the memory issues.

Initially, this resulted in all servers starting successfully and no recurrence of the memory issue was immediately apparent. Approximately one hour later however we saw the problem reappear with the virtual servers going down once more.

While tackling this latest outage our team noticed a corresponding uptick in network traffic, which upon further investigation pointed to a Distributed Denial-of-Service (DDoS) attack being performed against one of the virtual servers on the node. We then blocked this traffic upstream, and worked with the customer to reconfigure the targeted server to prevent the attack from continuing. Once this was complete the node and the virtual servers it hosts remained stable.

These incidents are very disruptive and we apologise for the downtime that customers experienced. This was quite a unique DDoS attack that presented itself in an unusual way which didn’t trigger our usual network monitoring or our upstream provider’s monitoring. As a result we will be making some changes to monitor for this style of attack moving forward which will help us identify problems faster, and in addition our team is reviewing what changes can be implemented to prevent these sorts of attacks from impacting our services in the future.

If you have any questions about this incident or would like further detail, please get in touch with us at support@sitehost.co.nz .

Posted 2 months ago. Sep 27, 2018 - 17:09 NZST

Resolved
This node has continued to be stable and we believe this incident is resolved. We will post a full post-mortem once we have completed our review process.
Posted 3 months ago. Sep 26, 2018 - 15:29 NZST
Monitoring
Engineers believe the hardware node is now stable and all customer servers should be back online. We'll be monitoring the node closely but are not expecting further disruption. If you are still experiencing issues please get in touch with support.
Posted 3 months ago. Sep 26, 2018 - 13:29 NZST
Update
Our engineers believe they have identified the root cause of the instability and are working to mitigate it while bringing the last few remaining servers back online.
Posted 3 months ago. Sep 26, 2018 - 12:33 NZST
Update
Engineers are still isolating the root cause of the instability on this hardware node. Some customers will see their servers online for periods of time as we continue our work but they may not be stable at this time.
Posted 3 months ago. Sep 26, 2018 - 12:06 NZST
Identified
We have had a recurrence of the outage and are actively working on bringing servers back online. We will continue to update this incident with our progress.
Posted 3 months ago. Sep 26, 2018 - 11:38 NZST
Monitoring
We have successfully started all the virtual servers on this node and they appear to be running as per normal. We will continue to monitor the node for future issues and will follow up with an incident report. If you are still experiencing issues as a result of this outage please get in touch with us at support@sitehost.co.nz .
Posted 3 months ago. Sep 26, 2018 - 11:32 NZST
Update
Our engineers are continuing to investigate the outage and are working to identify why customer servers are not starting. We will continue to update here as we learn more.
Posted 3 months ago. Sep 26, 2018 - 10:35 NZST
Investigating
We have experienced an unscheduled outage on the node NZ-AKL2-9QQQZ9 - in the Auckland Northshore datacenter that is impacting a few customers. Our engineers are investigating this at the moment and we will update when we know more.
Posted 3 months ago. Sep 26, 2018 - 09:54 NZST