x86 i undeniably the king of datacentre compute architecture, but there’s good reason to believe old architectures are making a comeback
When you ask IT pros to think of cloud the first thing that often comes to mind is web-delivered, meter-billed virtualised compute (and increasingly storage and networking) environments which, today, tends to imply an x86-centric stack built to serve up mostly any workload. But anyone watching this space closely will see x86 isn’t the only kid on the block, with SPARC, ARM and Power all vying for a large chunk of the scale-out market, as enterprises seek to squeeze more power out of their cloud hardware. What will the cloud stack of tomorrow look like?
Despite the dominance of x86 in the datacentre it is difficult to ignore the noise vendors have been making over the past couple of years around non-x86 architectures like ARM (ARM), SPARC (Oracle) and Power (IBM), but it’s easy to understand why: simply put, the cloud datacentre market is currently the dominant server market, with enterprises looking to consume more software as a service and outsource more of their datacentre operations than ever before.
Sameh Boujelbene, director of server research at Dell’Oro Group says over 50 per cent of all servers will ship to cloud service providers by 2018, and the size of the market (over $40bn annually by some estimates) creates a massive opportunity for new – and in some cases old non-x86 vendors aiming to nab a large chunk of it.
The nature and number of workloads is also changing. The number of connected devices sending or requesting data that needs to be stored or analysed, along with
the number and nature of workloads processed by datacentres, will more than double in the next five years, Boujelbene explains. This increase in connected devices and workloads will drive the need for more computing capacity and more physical servers, while driving exploration of more performant architectures to support this growing workload heterogeneity.
This article appeared in the March/April edition of BCN Magazine. Click here to download the issue today.
But it’s also important to recognise how migration to the cloud is impacting the choice of server form factors, choice of server brand and the choice of CPU architecture from the datacentre or cloud service provider perspective. Needless to say, cloud service providers have to optimise their datacentre efficiency at every turn.
“Generally, they are moving from general purpose servers to workload optimised servers,” Boujelbene explains. “We see cloud accounts going directly to white box servers shipped by ODMs directly to cloud accounts not only to cut costs but also because ODMs allow customisation; traditional server OEMs such as Dell, HP and IBM simply didn’t want to provide customised servers few years ago.”
Boujelbene sees big opportunities for alternative architectures to x86 such as ARM, SPARC or Power because they provide better performance to run specific types of workloads, and Intel is reacting to that trend by making customised CPUs available to some large cloud accounts. The company has about 35 customised CPU SKUs, and growing, and late last year won a pretty large contract to supply Amazon Web Services, the largest and most established of the public cloud providers, with custom Intel Xeon E5-2666 v3 (Haswell) processors.
Others in the ecosystem, some likely to have joined the fray at some point and others less so, are being enticed to get involved. Mobile chip incumbent Qualcomm announced plans ‘with its own ARM-based offerings’ in November last year to enter the server chip market at some point over the next two years, which the company believes represents a $15bn opportunity over the next five years.
And about a month before the Qualcomm announcement HP unveiled what it called the first “enterprise-grade ARM-based server,” its Moonshot range – the first to support ARM’s v8 architecture. Around the same time, Dell’s chief executive officer and founder Michael Dell intimated to a room of journalists his company, a long time Intel partner, would not be opposed to putting ARM chips in its servers.
SPARC and Power are both very compelling options when it comes to high I/O data analytics – where they are notably more performant than commodity x86. ARM’s key selling points have more to do with the ability to effectively balance licensing, design and manufacturing flexibility with power efficiency and physical density, though the company’s director of server programmes Jeff Underhill says other optimisations – being driven by cloud – are making their way to the CPU level.
“Cloud infrastructure by its very nature is network and storage-centric. So it is essential it can handle large numbers of simultaneous interactions efficiently optimising for aggregate throughput rather than just focusing on the outright performance of a single server. Solutions with integrated high performance networking, as well as storage and domain specific accelerators augmenting their general processor capabilities, offer significantly improved throughput versus traditional general purpose approaches,” Underhill says.
Underhill explains that servers are actually becoming more specialised, though there is and will continue to be a need for general-purpose servers and architectures to support them.
“The really interesting thing to look at is the area where networking and server technologies are converging towards a more scalable, flexible and dynamic ‘infra- structure’. Servers are becoming more specialised with advanced networking and storage capabilities mixed with workload specific accelerators,” he says, adding that this is pushing consolidation of an increasing number of systems (particularly networking) onto the SoC.
Hedging Their Bets
Large cloud providers – those with enough resource to write their own software and stand up their own datacentres – are the primary candidates for making the architectural shift in the scale-out market because of the cost prohibitive nature of making such a move (and the millions of dollars in potential cost-savings if it can be pulled off well).
It’s no coincidence Google, Facebook and Amazon have, with varying degrees of openness, flirted with the idea of shifting their datacentres onto ARM-based or other chips. Google for instance is one of several service providers steering the direction of the OpenPower Foundation (Rackspace is another), a consortium set up by IBM in December 2013 to foster cross-industry open source development of the Power architecture.
Power, which for IBM is the core architecture under- lying its high-end servers and mainframes as well as its more recently introduced cognitive computing as a service platform Watson, is being pitched by the more than 80 consortium members as the cloud and big data architecture of choice. Brad McCredie, IBM fellow and vice president of IBM Power Systems Development and president of the OpenPower Foundation says there is a huge opportunity for the Power architecture to succeed because of barriers in how technology cost and performance at the CPU level is scaling.
“If you go back five or six years, when the base transistor was scaling so well and so fast, all you had to do was go to the next–gen processor to get those cost-to-performance takedowns you were looking for. The best thing you could do all things considered or remaining equal is hop onto the next gen processor. Now, service providers are not getting those cost take-down curves they were hoping for with cloud, and a lot of cloud services are run on massive amounts of older technology platforms.”
The result is that technology providers have to pull on more and more levers – like adding GPU acceleration or enabling GPU virtualisation, or enabling FPGA attachment – to get cost-to-performance to come down; that is driving much of the heterogeneity in the cloud – different types of heterogeneity, not just at the CPU level.
There’s also a classic procurement-related incentive for heterogeneity among providers. The diversity of suppliers means spreading that risk and increasing competitiveness in the cloud, which is another good thing for cost-to-performance too.
While McCredie says that it’s still early days for Power in the cloud, and that Power is well suited to a particular set of data-centric workloads, he acknowledges it’s very hard to stay small and niche on one hand and continue to drive down cost-to-performance. The Foundation is looking to drive at least 20 to 30 per cent of the scale- out market, which – considering x86 has about 95 per cent share of that market locked up – is fairly ambitious.
“We have our market share in our core business, which for IBM is in the enterprise, but we also want share in the scale-out market. To do that you have to activate the open ecosystem,” he says, alluding to the IBM-led consortium.
It’s clear the increasingly prevalent open source mantra in the tech sector is spreading to pretty much every level of the cloud stack. For instance Rackspace, which participates with both OpenStack and Open Compute Project, open source cloud software and hard- ware projects respectively, is actively working to port OpenStack over to the Power architecture, with the goal of having OpenStack running on OpenPower / Open Compute Project hardware in production sometime in the next couple of years. It’s that kind of open ecosystem McCredie says is essential in cloud today and, critically, that such openness need not come at the cost of loose integration or consequent performance tax.
SPARC, which has its roots in financial services, retail and manufacturing, is interesting in part because it remains a fairly closed ecosystem and largely ends up in machines finely-tuned to very specific database workloads. Yet despite incurring losses for several years following its acquisition of Sun Microsystems, the architecture’s progenitor (along with Motorola), Oracle’s hardware business mostly bucked that trend (one experienced by most high-end server vendors) throughout 2014 and continues to do so.
The company’s 2015 Q2 saw its hardware systems grow 4 per cent year on year to roughly $717m, with the SPARC-based Exalogic and SuperCluster systems achieving double-digit growth.
“We’ve actually seen a lot of customers that have gone from SPARC to x86 Linux now very strongly come back to SPARC Solaris, in part because the technology has the audit and compliance features built into the architecture, they can do one click reporting, and be- cause the virtualisation overhead with Solaris on SPARC is much lower when compared with other virtualisation platforms,” says Paul Flannery, senior director EMEA product management in Oracle’s server group.
Flannery says openness and heterogeneity don’t necessarily lead to the development of the most per- formant outcome. “The complexity of having multiple vendors in your stack and then having to worry about the patching, revision labels of each of those platforms is challenging. And in terms of integrating those technologies – the fact we have all of the databases and all of the middleware and the apps – to be able to look at that whole environment.”
Robert Jenkins, chief executive officer of CloudSigma, a cloud service provider that recently worked with Oracle to launch one of the first SPARC-as-a-Service platforms, says that ultimately computing is still very heterogeneous.
“The reality is a lot of people don’t get the quality and performance that they need from public cloud because they’re jammed through this very rigid frame- work, and computing is very heterogeneous –which hasn’t changed with cloud,” he says. “You can deploy simply, but inefficiently, and the reality is that’s not what most people want. As a result we’ve made efforts to go beyond x86.”
He says the company is currently hashing out a deal with a very large bank that wants to use the latest SPARC architecture as a cloud service – so without having to shell out half a million dollars per box, which is roughly what Oracle charges, or migrate off the architecture altogether, which is costly and risky. Besides capex, SPARC is well suited to be offered as a service because the kinds of workloads that run on the architecture tend to be more variable or run in batches.
“The enterprise and corporate world is still focused on SPARC and other older specialised architectures, mainframes for instance, but it’s managing that heterogeneous environment that can be difficult. Infrastructure as a service is still fairly immature, and combined with the fact that companies using older architectures like SPARC tend not to be first movers, you end up in this situation where there’s a gap in the tooling necessary to make resource and service management easier.”
Does It Stack Up For Enterprises?
Whereas datacentre modernisation during the 90s entailed, among other things, a transition away from expensive mainframes running Unix workloads towards lower-cost commodity x86 machines running Linux or Microsoft-based software packages on bare metal, for many large enterprises, much of the 2000s focused on virtualising the underlying hardware platforms in a bid ￼to make them more elastic and more performant. Those hardware platforms were overwhelmingly x86-based.
But, many of those same enterprises refused to go “all- in” on virtualisation or x86, maintaining multiple compute architectures to support niche workloads that ultimately weren’t as performant on commodity kit; financial services and the aviation industry are great examples of sectors where one can still find plenty of workloads running on 40-50 year old mainframe technology.
Andrew Butler, research vice president focusing on servers and storage at Gartner and an IT industry veteran says the same trend is showing up in the cloud sector, as well as to some extent the same challenges.
“What is interesting is that you see a lot of enter- prises claiming to move wholesale into the cloud, which speaks to this drive towards commoditisation in hardware – x86 in other words – as well as services, fea- tures and decision-making more generally. But that’s definitely not to say there isn’t room for SPARC, Power, mainframes or ARM in the datacentre, despite most of those – if you look at the numbers – appearing to have had their day,” Butler says.
“At the end of the day, in order to be able to run the workloads that we can relate to, delivering a given amount of service level quality is the overriding priority – which in the modern datacentre primarily centres on uptime and reliability. But while many enterprises were driven towards embracing what at the time was this newer architecture because of flexibility or cost, performance in many cases still reigns supreme, and there are many pursuing the cloud-enablement of legacy workloads, wrapping some kind of cloud portal access layer around a mainframe application for instance.”
“The challenge then becomes maintaining this bi-mod- al framework of IT, and dealing with all of the technology and cultural challenges that come along with all of this; in other words, dealing with the implications of bringing things like mainframes into direct contact with things like the software defined datacentre,” he explains.
A senior datacentre architect working at a large American airline who insists on anonymity says the infrastructure management, technology and cultural challenges alluded to above are very real. But they can be overcome, particularly because some of these legacy vendors are trying to foster more open exposure of their APIs for management interfaces (easing the management and tech challenge), and because ops management teams do get refreshed from time to time.
What seems to have a large impact is the need to ensure the architectures don’t become too complex, which can occur when old legacy code takes priority simply because the initial investment was so great. This also makes it more challenging for newer generations of datacentre specialists coming into the fold.
“IT in our sector is changing dramatically but you’d be surprised how much of it still runs on mainframes,” he says. “There’s a common attitude towards tech – and reasonably so – in our industry that ‘if it ain’t broke don’t fix it’, but it can skew your teams towards feeling the need to maintain huge legacy code investments just because.”
As Butler alluded to earlier, this bi-modality isn’t particularly new, though there is a sense among some that the gap between all of the platforms and archi- tectures is growing when it comes to cloud due to the expectations people have on resilience and uptime but also ease of management, power efficiency, cost, and so forth. He says that with IBM’s attempts to gain mind- share around Power (in addition to developing more cloudy mainframes), ARM’s endeavour to do much the same around its processor architecture and Oracle’s cloud-based SPARC aspirations, things are likely to remain volatile for vendors, service providers and IT’ers for the foreseeable future.
“It’s an incredibly volatile period we’re entering, where this volatility will likely last between seven years possibly up to a decade before it settles down – if it settles down,” Butler concluded