Managing cloud lifecycles


Steve Cassidy

20 Feb, 2020

Hardly anybody talks about lifecycles in IT, least of all me. I don’t see the end of use of any device as a special occasion to be marked and celebrated: I still have working PCs from the late 1990s. Even so, I had to stop and pay attention when I heard a senior exec from Arm – the world’s most popular CPU maker no less – mention that major cloud players are now reinvesting in their data centres on a yearly basis.

This is an incredibly short lifecycle, but when it comes to the cloud there are multiple things that might need to be retired, upgraded or otherwise cycled. One is the data centre hardware itself; this might seem like a very fundamental refresh, and it could transform the customer experience, making things either faster or slower. But, in these days of virtual machines and serverless design, it might equally be completely invisible from the outside, except where it leads to a change in tariffs.

Then there are upgrades to the orchestrator or container OS. These tend to happen with little or no notice, excused by the urgency of applying the latest security updates. As a result, any dependencies on old code or deprecated features may only come to light on the day of the switch. As a savvy cloud customer, your best defences against such upheaval are to spread your systems across multiple suppliers, maintain portfolios of containers running different software versions and take a strong DevOps approach to your own estate.

Other scenarios include the sort of big move when a beta site is finally promoted and becomes the main site, and the eventuality of a cloud provider being taken over by another, resulting in a burst of service changes and tariff renegotiation. Remember, lots of high-tech businesses operate with the express intention of being acquired at some point, once they have a good portfolio of customers, a steady revenue stream and hence a high share price. Such a strategy is music to the ears of venture capitalist backers, eager to recoup their investment and profits; I will leave you to consider whether it’s well suited to cloud services, which place a high emphasis on continuous and uninterrupted service. There’s a reason why many cloud company contracts are all about inhibiting customer mobility.

Migration patterns

It’s clear that, when we’re talking about the cloud, “lifecycle” entails a spread of quite different activities, and bringing them all together under one banner doesn’t do you much good: the lessons learnt from going through one of the above events won’t do much to help with others. 

However, the situation doesn’t have to be complicated – at least not if you actually have developers, and aren’t just stuck with a narrow selection of package providers. If you are in this lucky position, and you’ve been listening to at least the tone of your development team’s comments on the various fads and fashions in development, there’s a fair chance that your IT portfolio will have been built with the sorts of tools that produce nice, mobile and tablet-friendly, infinitely resizeable, bandwidth-aware, cloud-scalable websites. If that’s what you’re working with, it can be relatively easy to ride out lifecycle events.

Unfortunately, this is by no means universally the case, especially not for systems that have been around long enough for large parts of the business to have been built on them. If you already have a code base that works, it can be tough to secure the development time and cost commitment to move it from (say) QuickBASIC or COBOL onto Ruby on Rails, Java or PHP. 

Yet this is itself one of the most significant lifecycle events, or at least part of it. It may seem a stretch to refer to code migration as a lifecycle end, but when you first unleash your prototype on a public cloud platform, nobody really knows how it’s going to perform, or how resource-hungry it might be, and your production systems person is not going to want those kind of unknowns messing up their carefully controlled production space. The requirements for releasing that prototype into the big bad world thus emerge from the development and testing process. 

That output ought to, at least, incorporate a statement about what needs to be done, and after how long, with an eye on three quite distinct systems. First, there’s the prototype in its current state, which at this point is probably still languishing on Amazon or Azure. Then, of course, there’s the predecessor system, which is going to hang around for a couple of quarters at least as your fallback of last resort. Then there’s the finished, deployed product – which, despite your diligent testing, will still have bugs that need finding and fixing. Redevelopment involves managing not one, but three overlapping lifecycles.

If you’re wondering how much of this is specific to the cloud, you have a good point. You would have had very similar concerns as a project manager in 1978, working in MACRO-11 or FORTRAN. Those systems lack the dynamic resource management aspect of a cloud service, but while cloud suppliers may seek to sell the whole idea of the “journey to the cloud”, for most businesses reliability, rather than flexibility, remains the priority. 

The question, indeed, is whether your boringly constant compute loads are actually at the end of their unglamorous lifecycle at all. It’s possible to bring up some very ancient operating systems and app loads entirely in cloud-resident servers, precisely because many corporates have concluded that their code doesn’t need reworking. Rather, they have chosen to lift and shift entire server rooms of hardware into virtual machines, in strategies that can only in the very loosest sense be described as “cloud-centric”.

Fun with the law

Despite the best efforts of man and machine, cloud services go down. And when it happens, it’s remarkable how even grizzled business people think that legally mandated compensation will be an immediate and useful remedy. Yes, of course, you will have confirmed your provider’s refund and compensation policy before signing up, but remember that when they run into a hosting issue, or when their orchestrator software is compromised by an infrastructure attack, they will suddenly be obliged to pay out not just for you, but for everybody on their hosting platform. What’s the effect going to be on their bottom line, and on future charges?

If you’ve been good about developing a serverless platform, hopping from one cloud host to another isn’t going to be a big issue. Even if you’re in the middle of a contract, you may be able to reduce your charges from the cloud provider you’re leaving, simply by winding down whatever you were previously running on their platform. After all, the whole point of elastic cloud compute is that you can turn the demand up and down as needed.

Sometimes you might end up in the opposite situation, where you reach the end of a fixed-term contract and have no option but to move on. This comes up more often than your classic techie or development person imagines, thanks to the provider’s imperative to get the best value out of whatever hardware is currently sitting in the hosting centre. If there’s spare capacity in the short term, it makes sense for the vendor to cut you a time-limited deal, perhaps keeping your cloud portfolio on a hosting platform from a few years ago and thereby not overlapping the reinvestment costs on their newer – possibly less compatible – platform.

Hardware and software changes

For some reason that nobody seems minded to contest, it’s assumed in the cloud industry that customers will be agile enough to handle cloud vendors making root and branch changes to the software platform with effectively no notice. You come in to the office with your coffee and doughnuts, to be greeted by a “please wait” or a similarly opaque error, which means that your cloud login and resources are now being managed by something quite new, and apparently untested with at least your password database, if not the content of your various memberships and virtual machines. 

Most people active in IT operations management would not want to characterise this as a lifecycle opportunity. That particular field of business is particularly big on control and forward planning, which are somewhat at odds with the idea of giant cloud suppliers changing environments around without warning. When you and 100 million other users are suddenly switched to an unfamiliar system, the behaviour you have to adopt comes not from the cloud vocabulary, but rather from the British government: we’re talking about cyber-resilience. 

If that sounds like a buzzword, it sort of is. Cyber-resilience is a new philosophy, established more in the UK than the US, which encourages you to anticipate the problem of increasingly unreliable cloud services. It’s not a question of what plan B might look like: it is, rather, what you can say about plan Z. And that’s sound sense, because finding your main cloud supplier has changed software stack could be as disastrous for your business as a ransomware attack. It can also mark a very sharp lifecycle judgement, because your duty isn’t to meekly follow your provider’s software roadmap: it’s to make sure that a rational spread of cloud services, and a minimalist and functionally driven approach to your own systems designs, gives you the widest possible range of workable, reachable, high-performance assets. 

Don’t panic!

If you’re already invested in cloud infrastructure, this talk might seem fanciful; in reality, few businesses experience the full force of all these different scenarios. The biggest difficulties with the cloud usually involve remembering where you left all your experiments, who has copies of which data sets, and how to identify your data once it skips off to the dark web. The dominant mode here is all about things that live on too long past their rightful end, and that’s slightly more manageable than the abrupt cessations of access or service we’ve been discussing.

Even so, it’s important to carry out the thought experiments, and to recognise that lifecycles can be chaotic things that require a proactive mindset. One could even say that the lifecycle of the “lifecycle” – in the sense of a predictable, manageable process – is coming to an end, as the new era of resilience dawns.