With all the hype about how cloud delivery brings new levels of flexibility and availability, many organisations may be falling for misleading reports claiming that moving to a cloud model somehow diminishes the need to worry about business continuity (BC) or disaster recovery (DR).
Nothing could be further from the truth.
Like the parody disaster movie Airplane, it would just take one thing to go wrong and you could find yourself in quite a pickle, but without any of the humour. And you will need more than an inflatable autopilot to get you back on course.
Some cloud providers have done a good job clarifying that responsibility for continuity and recovery in the supported IT service remains with the customer. Cloud providers do offer some help – for example, providing multiple availability zones – and organisations should, at the outset, consider this, as well as the benefits of a multi–vendor cloud strategy to reduce the risk of any critical services going down.
Three different areas are needed to address BC and DR: continuity planning to ensure the relevant events, risks and business impacts are represented in the solution requirements; recovery solutions that are well-designed, professional managed and inspire confidence through effective testing; and crisis management to escalate incidents through a clear process, up to DR invocation and beyond to repatriation to the primary/BAU state.
While continuity planning is clearly the responsibility of the business, rather than the IT function, it is still important for IT to ensure its service impact assessments cater for all relevant risks. Cloud service dependencies may need more analysis, as a number of low-criticality business services sharing the same infrastructure-as-a-service can incur a combined impact even greater than that from a single high-criticality service.
The role of the IT function – and particularly the IT incident desk – in crisis management is obvious and requires a close collaboration with the business. IT has to quickly recognise any underlying or growing crisis-level events and invoke the necessary action plans. Never underestimate the importance of executing the action plans in a controlled manner. A certain degree of panic is unavoidable in a crisis, severely limiting the ability to analyse situations and make the right decisions. Simple, broadly understood and well rehearsed action plans mitigate this risk.
Recovery solutions is the only area that falls clearly under IT’s remit. In this regard, it is also true that cloud suppliers are likely to manage technology platforms better than most of their customers – it is their core business after all — but these platforms are merely ingredients in the bigger solution.
While we may expect small platform resilience improvements after moving to the cloud, solution availability and recovery remain the responsibilities of the IT function. The benefits of using a cloud model can include the use of multiple availability zones, or even multiple suppliers, as well as the management and support of these platform components.
AWS, Google, Oracle and others have promoted availability zones for some time. Recognising the importance of this customer need, Microsoft recently announced it would follow suit. The growth of multi-cloud strategies also reflects the customer requirement for flexibility between suppliers, including protecting against any single vendor outage or issues.
Even when all this is taken into account, organisations must consider consolidated service impacts. Cloud providers’ SLAs will likely reflect minimal periods of downtime, but it is the organisation’s responsibility to address any repeated points of failure within the business impact analysis, such as multiple services relying on the same infrastructure provision service.
This becomes more critical as companies undergo digital transformation, leading to a growing number of business-critical applications becoming reliant on the same cloud services. Suddenly this minimal outage period is a major business headache.
IT’s domain extends beyond the data centre space and is increasingly responsible for workplace recovery through digital workspace 'work anywhere’ practices. Extending this further, a business function may misinterpret their adoption of software-as-a-service as being separate from the IT function. However, as with other shadow IT, management of the service wrapper and service inter-dependencies must remain the remit of the IT function.
While IT is accountable for recovery solutions, and partially responsible for crisis management, it is clear that it cannot own business recovery. It is down to the individual business functions to manage their working practices to support service continuity through to complete business recovery. Since IT cannot address this, cloud certainly cannot.
The SIAM effect
With DR, and particularly BC, we are reminded that impacts on people and process are at least as high risk as those from technology. This is where the service integration and management (SIAM) and operating models have to be updated with an eye on end-to-end service delivery and skills coverage.
When IT service layers are supported across multiple internal or external suppliers, it is vital that all incidents are clearly tracked and consistently managed end-to-end through a single SIAM function. This is critical for effective crisis management as well as being a cornerstone of successful cloud adoption.
In fact, unless it is managed well, cloud adoption can have a negative impact on the people and process, driven by the false view that cloud is simply another technology platform. For example, cloud adoption will involve a significant shift in the knowledge base and focus areas of the IT staff. The IT function must recognise how this impacts DR and BC solutions and ensure access to the expertise it needs, whether internally or externally.
Although cloud adoption should never be thought of as a solution to an organisation’s DR and BC challenges, it certainly can play a positive role. The combination of an effective SIAM model, an evolved cloud operating model, diligent IT service impact assessments and use of availability zones across multiple cloud suppliers, can help organisations ensure their business have the built-in resilience to survive any outages and disasters. If IT and the business work hand-in-hand on the planning and execution of DR and BC strategies, you can leave your inflatable autopilot at home.
Editor’s note: This is the first in a three-part series exploring common cloud myths and misconceptions. Stay tuned to CloudTech for the next instalment.