Microsoft Azure’s UK South storage region suffered an outage yesterday, just a day after the company debuted its Azure Data Box Disk.
Just after lunchtime, customers started reporting their services were down. Some said their Azure accounts were unavailable, while others said they could only see a spinning wheel when trying to access the cloud service.
The problems began on Azure Storage, but spread to other services, including App and Virtual Machines, with the company’s status page showing a blanket outage for all services after the issue was first reported. Its Azure UK West storage had not been affected at the time of writing.
“Starting at 13:19 UTC on 10 Jan 2019, a subset of customers leveraging Storage in UK South may experience service availability issues. In addition, resources with dependencies on Storage, may also experience downstream impact in the form of availability issues. Engineers have been engaged and are actively investigating. The next update will be provided in 60 minutes, or as events warrant,” the company’s service status page reported yesterday. An update later confirmed the issue continued until “approximately 05:30 UTC on 11 Jan 2019.”
The Azure Support team used Twitter to confirm that the issue had now been resolved, saying: “Mitigated: Engineers have confirmed that the Storage availability issue in UK South is resolved. Any customers experiencing residual impact will receive communications to their portal. A full Root Cause Analysis will be provided in approximately 72 hours.”
However, some customers were unhappy that, while services were offline, the company had failed to communicate much since its original message as engineers scrambled to fix the issues.
So an update on the UK South stuff would be good @AzureSupport . Been more than 2 hours since you last said next update in an hour…. *yawn* It's not like your VMs, storage, site recovery and storage apps are all broken. Oh wait…..
— George Wilson (@GW1992) January 10, 2019
Hi there, sorry for the delay. Our engineers are working very hard to resolve this issue and will update the status page as soon as an update is available. Also, could you DM https://t.co/ObUanPWteA me your subID so we can better investigate this? Thank you for your patience. ^AU
— Azure Support (@AzureSupport) January 10, 2019
The support team then followed up with another response two hours later, saying: “Hi there, as continue working to resolve this issue, we are wondering if you have seen any signs of recovery yet?”
In terms of what caused the outage, Microsoft said: “Engineers determined that a number of factors, initially related to a software error, caused several nodes on a single storage scale unit to become temporarily unreachable. This, along with the increase in load on the scale unit caused by the initial issue, resulted in impact to customers with Storage resources located on this scale unit.”