All posts by Lindsay Clark

What can companies learn from object storage pioneers?


Lindsay Clark

28 Jan, 2020

The shift to the cloud is encouraging enterprises to rethink their options on storage. According to a June 2019 study from IHS Markit, 56% of organisations said they plan to increase investment in object storage, putting it ahead of unified storage at 51%, storage-area networks at 48% and network-attached storage at 36%. Most object storage is in the cloud, with popular examples including AWS S3, Azure Blob Storage and Google Cloud Platform (GCP) Cloud Storage.

But shifting to a new storage architecture at the same time as the cloud move is not entirely painless.

At the beginning of the decade, Moneysupermarket.com, the consumer online comparison and information site for financial services, was using a combination of SQL databases and SAS analytics environment. By 2014, it had moved to AWS for website hosting and data analytics, including use of S3 object storage and Vertica data warehouse. By May 2019, it moved its data and analytics to GCP using the BigQuery data warehouse and Cloud Storage object storage. The website itself remains on AWS.

Harvinder Atwal, Chief Data Officer at MoneySuperMarket, tells Cloud Pro: “One of the good things about the cloud is the initial learning curve is very shallow: it’s easy to start. But then you get to the point where it’s very much steeper and you need to understand some of the complexities involved.”

One example of those complexities is the introduction of object lifecycle policies. The idea is to define policies to manage objects throughout the time the organisation needs them. That might be to move them to cheap long-term storage such as AWS Glacier or to expire them all together. Getting these rules right from the outset can save costs.

“That’s one of the things that maybe we should put a little more effort into from the very beginning,” Atwal says.

Other advice for those moving to object storage in the cloud includes avoiding biting off more than the team can chew.

“I would not do the migration all in one go,” Atwal says. “I think the bigger project and the more money and resources it uses, the more likely it is to fail. I would encourage people to think of their use case and application and build a minimal viable product around that.”

It’s worth getting advice about the transition from independent third parties, which the cloud platform vendors can recommend. For example, Moneysupermarket.com used a consultancy called DataTonic with its transition to Google Cloud Platform.

Lastly, there can be a cultural change in store for the IT department, Atwal says. “The IT function can be very traditional in its thinking around how you use data. They think you must cleanse it, put it into a relational schema and only then can users access it. But with data today, the value in analytics comes from actually being able to use data for many sources and join them together, and IT has to learn to ditch its historic mindsets.”

Nasdaq, the tech stock market, began working with AWS in 2012. It stores market, trade and risk data on the platform using S3 and Glacier. It uploads raw data to Amazon S3 throughout the trading day, using a separate system running in the cloud, converts raw data into Parquet files and places them in their final S3 location. This way, the system is able to elastically scale to meet the demands of market fluctuations. It also uses Amazon Redshift Spectrum to query data to support billing and reporting, and Presto and Spark on Elastic MapReduce (EMR) or Athena for analytics and research.

“Migrating to Amazon S3 as the ‘source of truth’ means we’re able to scale data ingest as needed as well as scale the read side using separate query clusters for transparent billing to internal business units,” says Nate Sammons, assistant vice president and lead cloud architect at Nasdaq.

But getting the scale of analytics solutions right for the problem has been a challenge, he says. “We currently operate one of the largest Redshift clusters anywhere, but it’s soon to be retired in favour of smaller purpose-specific clusters. Some of the custom technologies we developed [in the early days] have since been retired as cloud services have matured. Had technologies like Amazon Redshift Spectrum existed when we started, we would have gone straight to Amazon S3 to start with, but that was not an option.”

The advantage of using S3, though, was that it made the organisation less concerned about individual machine outages or data centre failures, Sammons says. “If one of the Amazon Redshift Spectrum query clusters fail, we can just start another one in its place without losing data. We don’t have to do any cluster re-sizing and we don’t require any CPU activity on the query clusters to do data ingest.”

Rahul Gupta, IT transformation expert at PA Consulting, says those exploiting object storage in the cloud should know that apparent scalability and elasticity does not remove the need to do some basic housekeeping on data.

“A lot of people feel storage is cheap, so they build systems with vast amounts of data and think the impact on cost is not that great. They push the data into S3, or an equivalent, and then once it’s in there, they feel that they can impose structure on the data, which is not the right thing to do,” he says.

He says that by understanding data structure upfront and creating governance such as role-based access, organisations will not have to revisit the architecture once the data grows.

Just because so many organisations are moving storage to the cloud, does not mean they all get the same value from the transition. The considerable investment cloud infrastructure, storage and analytics application will offer the greatest returns to those who understand the storage lifecycle upfront, create some governance rules around access and understand data structure from the outset.

Managing the cloud money pit


Lindsay Clark

24 Jan, 2020

Despite the growing popularity of cloud computing, organisations still struggle to control costs. Research from Flexera, a provider of IT management software, shows 84% of enterprises find optimising cloud costs a growing challenge. The 2019 survey of 786 technology professionals also found organisations underestimate their wastage, putting it at 27% of cloud spending whereas the real figure — according to Flexera — is around 35%.

Part of the problem is the shift from the old world to the new, as organisations lift more of their applications and infrastructure to the cloud, according to Adrian Bradley, cloud advisory lead at KPMG

“With on-premise contracting, you get a set of unit prices and service levels, you negotiate with a provider, and then that will be a one-off procurement exercise that lasts three to five years. You set the value within that initial negotiation, and everyone can go home and leave it to relatively junior people to execute the contract because the unit prices have protection,” he tells Cloud Pro.

In the cloud, however, decision-making can often be handed down to junior developers and infrastructure managers, but pricing can be very dynamic with complex discount arrangements. “The consequence is that the cost of cloud has actually been more than expected,” Bradley explains.

The challenge of controlling costs comes in two parts. Firstly, can organisations exploit the standard price structures of the big cloud vendors to help them get better value for money? And secondly, can organisations try to negotiate their own ‘special’ deals from cloud vendors, and get better value than the standard price structures offer?

On the first question, KPMG’s Bradley says mature organisations are finding ways to get more bang for their bucks.

In AWS, for example, prices are mostly baked into reserved instances, where users commit to a certain level of compute for a certain period, with incremental discounts in line with how much they commit.

“Mature users on cloud have become quite sophisticated in planning effectively around that,” he says.

Organisations can also get better deals depending on when they reserve computing power. “There’s an element of arbitrage, because it’s a bit like looking for holidays. If you book early, you get a good deal.”

But, in a similar vein, there are bargains to be had in last-minute deals in cloud computing spot markets via companies such as Spotinst and Cloudability, Bradley says.

“And just like a holiday, if you book late and you’re unfussy about where you go, then you can also get a great deal. That’s not something that’s part of the initial negotiation, and what you have to work out is how you can best make use of the economic models the cloud providers have created,” he says.

The second question addresses whether organisations can negotiate away from the standard price structure. It’s possible, but only for the world’s largest corporations such as multinational consumer goods firms and banks, Bradley says.

“If you hit that threshold of scale and you’re talking about really substantial workloads, then you can have a specific negotiation. That does get you a little bit further,” he says.

To lower prices further still, large businesses can propose creative deals with cloud providers. For example, BP, the oil and energy company, has agreed to supply AWS with 170 MW of renewable energy as part of its cloud computing contract.

But to negotiate, organisations have to prepare. Their chances of success depend as much on the measures they put in place internally as they do on their approach to suppliers.

Mike Jette, industry lead for telecoms, media and technology at procurement outsourcing firm GEP says: “In the early days, it was like the wild west. In a lot of organisations, tons of different people were buying cloud services in an uncoordinated fashion, trying to align with their strategic objectives with very little structured governance or procurement. It was just a lot of people trying to say, ‘hey look I moved to the cloud’.”

How organisations manage their cloud consumption is half the challenge in getting more value, he says. “You need to be thoughtful on the buy side, but the management side is really important to maintaining costs and getting value out of the service providers.”

This means understanding how much the organisation is consuming, and how that might vary, he says. “To get leverage [with suppliers] you have to have management and controls in place. The early adopters have gone through this exercise and they’ve taken 30%-plus of the cost out.”

If they go to market with enough volume, there is always room to negotiate, he says. “You need to have a sense of what the estate looks like and where it’s going to grow to, but there’s definitely an opportunity to negotiate. The cloud service providers like to talk about their market share: they’re in the business of buying volume now,” he says.

To get the best deals from suppliers, organisations need to understand and predict the volumes they will require. It can be a thankless task and even goes against some of the advantages of cloud computing, says Matt Yonkovit, chief experience officer at Percona, an independent consulting company that helps move open source databases to the cloud.

Although organisations can create guardrails to try to guide developers to certain platform providers and solutions, many still want the freedom to choose. Meanwhile, the cloud providers offer so many services — as many as 180 from AWS for example – each with a separate pricing structure, that estimates of demand are often inaccurate, he says.

While there are machine learning tools that can help, some organisations want to burst out workloads to support business demand: ecommerce companies supporting Christmas shopping, for example.

Just as important as forecasting demand is to ensure applications and databases are configured to the cloud environment and minimise consumption, he says. “People don’t understand the shared responsibility model, and that causes most of the extra spend. Understand the technology and optimising system can reduce costs.”

The big three cloud providers – which command more than half the market between — may have the upper hand in negotiating with customers. But buyers are strengthening their position by better understand and controlling their demand, exploiting spot markets, and better configuring their technology. Excelling in these areas will build value for cloud buyers.

How Johnson & Johnson boosted its performance by lifting Teradata to AWS


Lindsay Clark

6 Nov, 2019

Data has become the engine that drives modern business, and collating and analysing that data is a crucial component of many IT departments’ duties. Most turn to Enterprise Data Warehouse (EDW) technologies, which offer platforms that allow business to centralise their data for easier analysis and processing.

Teradata is among the most well-known EDW platforms on the market, having spent the last 40 years building its reputation providing on-premise EDW hardware and software for customers including General Motors, P&G, eBay and Boeing. It has now transitioned to a cloud-first model and is now available on all three major public cloud providers, following the addition of Google Cloud Platform support on 22 October 2019.

Back in 2017, however, the company’s cloud credentials were not so well-established. That’s why when healthcare and pharmaceuticals giant Johnson & Johnson (J&J) decided to move its data stores to a Teradata-powered cloud infrastructure, the plan was met with surprise and skepticism. In the years leading up to the project, J&J’s senior manager for data and analytics Irfan Siddiqui says, the company became aware its current on-premise platform would not support its burgeoning data analytics requirements demands at an affordable price for very much longer.

“We [had] been experiencing some challenges and thinking about how we transform the traditional data warehouse into a more modern service, particularly around the flexibility, scalability and cost, and we were searching for a solution,” he told a Teradata conference in Denver, Colorado earlier this year.

And so, in 2017 it started to look at migrating its enterprise data warehouse (EDW) system to the cloud, eventually landing on Teradata as the most promising solution provider for its problems.

At that time, the offer of Teradata on AWS was not widely considered mature enough for an enterprise environment, Siddiqui tells Cloud Pro.

Five lessons from Johnson & Johnson’s EDW cloud migration

Identify all the stakeholders involved and begin discussions to identify potential challenges

Start with a small proof of concept to test all aspects of the potential solution

Understand as early as possible the network bandwidth and latency between your on-premise and cloud solutions

Expect some things to go wrong the first time you try them

Engage a strong project manager, who is good with timelines and risk, to be the single point of contact for communicating progress

Practise processes over and over again, including failure scenarios

“When Teradata released its first machine on AWS, and I said I wanted to do a proof of concept for Teradata in the cloud, people who knew Teradata, their first reaction was, ‘What? Why? Really?’.”

However, the commitment from Teradata to show its systems could work in the cloud was so strong Siddiqui found the confidence to go into a proof of concept. Initial trials showed promise.

The 80-terabyte a-ha moment

“Most of us know doing a capacity expansion or migration to new hardware takes in the order of six months but [with AWS] we were able to spin up a formal system with 80TB of data in just 20 minutes. That was one of the ‘a-ha moments’ for us which became the driving force for us to take another step,” he says.

J&J set itself five goals in lifting Teradata to the cloud, Siddiqui says: to migrate three data environments and all its applications by the halfway point of 2019; to offer the same or improved performance compared with the on-premise system; and to increase flexibility and scalability while reducing cost.

This posed a sizeable challenge for Siddiqui’s team, which aimed to support about 300TB of storage, 50 business applications and 2,500 analytics users on to a system capable of handling more than 200 million queries per month.

It also raised some significant questions.

“How are our applications going to perform? How do we migrate? What happens with downtime, and stability and security?” he says. “We had to address these questions, not just for our leadership team, but all the stakeholders across J&J. We had to show how it would benefit each one of us.”

Most applications stay on-prem

Although all the data warehouse workloads would be in the cloud, most of the related analytics applications and data visualisation tools, including Qlik, Talend, Informatica, and Tibco, remained on-premise.

Some applications were split between the cloud and on-premise servers. For example, J&J wanted to spin up application development environments in the cloud when they were required and only pay when using them. “That is the flexibility we did not have our own servers,” Siddiqui says.

Given the migration had to follow an upgrade to the data warehouse production environment, deadlines became tight. The team worked for three months more or less continuously. But by the end of June of 2019, it was able to decommission the on-premise data warehouse hardware systems.

The hard work has paid off for Siddiqui and his team. Extract-transform-load jobs now take half the time compared to the on-premise system. Large Tableau workload performance has improved by 60% and another application’s data loading was cut from more than three hours to 50 minutes.

Beware the desktop data hoarders

Claudia Imhoff, industry analyst and president of Intelligence Solutions, says it makes sense to put enterprise data warehousing in the cloud in terms of scalability and performance, but there are caveats.

“It’s a wonderful place if you have all the data in there. But, unless you’re a greenfield company, nobody has all of their data in the cloud. Even if most operational systems are in the cloud, there are so many little spreadsheets that are worth gold to the company, and they’re on somebody’s desktop,” she says.

“There are arguments for bringing the data into the cloud. It is this amorphous thing, and you don’t even know where the data is being stored. And you don’t care, as long as you get access to it. Some of it’s in Azure, some of it’s in AWS, and some of it is in fill-in-the-blank cloud. And, by the way, some of it is still on-premise. Can you bring the data together virtually and analyse it? Good luck with that,” she adds.

To succeed in getting data warehousing and analytics into the cloud, IT must convince those hoarding data on desktop systems that it is in their interest to share their data. The cloud has to do something for them, she says.

Despite the challenges, enterprise IT managers can expect to see more data warehouse deployments in the cloud. In April, IDC found the market for analytics tools and EDW software hosted on the public cloud would grow by 32% annually to represent more than 44% of the total market in 2022. These organisations will have plenty to learn from J&J’s data warehouse journey.