OpenStack and NVMe-over-Fabrics: Getting higher performance for network-connected SSDs

What is NVMe over Fabrics (NVMe-oF)?

The evolvement of the NVMe interface protocol is a boon to SSD-based storage arrays. It further powered SSDs (solid state drives) to obtain high performance and reduced latency for accessing data. Benefits further extended by the NVMe over Fabrics network protocol brings NVMe feature retained over the network fabric, while accessing the storage array remotely. Let’s understand how.

While leveraging NVMe protocol with storage arrays consists of high-speed NAND and SSDs, a latency was experienced when NVMe based storage arrays were accessed through shared storage or storage area networks (SAN). In SAN, data should be transferred between the host (initiator) and the NVMe-enabled storage array (target) over Ethernet, RDMA technologies (iWARP/RoCE), or Fibre Channel. Latency was caused due a translation of SCSI commands into NVMe commands, in the data transportation process.

To address this bottleneck, NVM Express introduced the NVMe over Fabrics protocol, to get replaced with iSCSI as a storage networking protocol. With this, the benefits of NVMe were taken onto network fabrics in a SAN-kind of architecture to have a complete end-to-end NVMe-based storage model which is highly efficient for modern workloads. NVMe-oF supports all available network fabrics technologies, such as RDMA (RoCE, iWARP), Fibre Channel (FC-NVMe), Infiniband, Future Fabrics, and Intel Omni-Path architecture.

NVMe over Fabrics and OpenStack

As we know, OpenStack consists of a library of open source projects for the centralised management of data centre operations. OpenStack provides an ideal environment to implement an efficient NVMe-based storage model for high throughput. OpenStack Nova and Cinder are components used in proposed NVMe-oF with an OpenStack solution. This consists of creation and integration of Cinder NVME-oF target driver, along with OpenStack Nova.

OpenStack Cinder is a block storage service project for OpenStack deployments mainly used to create services which provide persistent storage to cloud-based applications. It provides APIs to users to access storage resources without disclosing storage location information.

OpenStack Nova is a component within OpenStack which helps provide on-demand access to compute resources like virtual machines, containers, and bare metal services. In NVMe-oF with OpenStack solutions, Nova is attaching NVMe volumes to VMs.

Support of NVMe-oF in OpenStack is available from the ‘Rocky’ release. A proposed solution requires RDMA NICs and supports kernel initiator and kernel target.

NVMe-oF targets supported

Based on the proposed solution above, we get two choices to implement NVMe-oF with OpenStack; first, with a kernel NVMe-oF target driver which is supported as of the OpenStack ‘R’ release, and second Intel’s SPDK (storage performance development kit) based NVMe-oF implementation containing SPDK NVMe-oF target driver and the SPDK LVOL (Logical Volume Manager) backend. This is anticipated to be in the OpenStack ‘S’ release.

Kernel NVMe-oF target: Here is the implementation consisting of support for kernel target and kernel initiator. But the kernel-based NVMe-oF target implementation has limitations in terms of number of IOPs per CPU core. Also, kernel-based NVMe-oF suffers latency issues due to CPU interrupts, many systems calling to read data, and time taken to transfer data between threads.

Kernel Based NVMe-oF + OpenStack ImplementationFig – Kernel Based NVMe-oF + OpenStack Implementation

SPDK NVMe-oF target: Why SPDK? SPDK architecture achieved high performance for NVMe-oF with OpenStack by moving all necessary application drivers to userspaces (apart from the kernel) and enables operation in polled mode instead of interrupt mode and lockless (avoiding the use of CPU cycles synchronising data between threads) processing.

Let’s understand what it means.

In SPDK implementation, storage drivers which are utilised for storage operations like storing, updating and deleting data are isolated from the kernel space where general purpose computing processes run. This isolation of storage drivers from kernel saves time required for processing in the kernel, and enables CPU cycles to spend more time for execution of storage drivers at user space. This avoids interruption and locking of storage drivers with other general-purpose computing drivers in kernel space.

In a typical I/O model, application requests a read/write data access and waits until the I/O cycle to complete. In polled mode, once the application places a request for data access, it goes at other execution and comes back after a defined interval to check completion of an earlier request. This reduces latency and process overheads, and further improves the efficiency of I/O operations.

By summarising, SPDK is specially designed to extract performance from non-volatile media, containing tools and libraries for scalable and efficient storage applications utilised userspace, and polled mode components to enable millions of I/Os per core. SPDK architecture is open source BSD licensed blocks optimised for bringing out high throughput from the latest generation of CPUs and SSDs.

SPDK ArchitectureFig – SPDK Architecture

Why SPDK NVMe-oF target?

As per the performance benchmarking report of NVMe-oF using SPDK, it has been seen that:

  • Throughput scales up and latency decreases almost linearly with the scaling of SPDK NVMe-oF target and initiator I/O cores
  • SPDK NVMe-oF target performed up to 7.3x better with regards to IOPS/core than Linux Kernel NVMe-oF target while running 4x 100% random write workload with increasing number of connections (16) per NVMe-oF subsystem
  • SPDK NVMe-oF initiator is 3x faster than Kernel NVMe-oF initiator with null bdev-based backend
  • SPDK reduces NVMe-oF software overheads by up to 10x
  • SPDK saturates 8 NVMe SSDs with a single CPU core

SPDK vs. Kernel NVMe-oF I/O Efficiency

Fig – SPDK vs. Kernel NVMe-oF I/O Efficiency

SPDK NVMe-oF implementation

This is the first implementation of NVMe-oF integrating with OpenStack (Cinder and Nova) which leverages NVMe-oF target driver and SPDK LVOL (Logical Volume Manager)-based SDS storage backend. This provides a high-performance alternative to kernel LVM and kernel NVMe-oF target.

SPDK Based NVMe-oF + OpenStack Implementation

Fig – SPDK Based NVMe-oF + OpenStack Implementation

The implementation was demonstrated at OpenStack Summit 2018 Vancouver. You can watch the demonstration video here.

If compared with Kernel-based implementations, SPDK reduces NVMe-oF software overheads and yields high throughput and performance. Let’s see how this will be added to the upcoming OpenStack ‘S’ release.

This article is based on a session at OpenStack Summit 2018 Vancouver – OpenStack and NVMe-over-Fabrics – Network connected SSDs with local performance. The session was presented by Tushar Gohad (Intel), Moshe Levi (Mellanox) and Ivan Kolodyazhny (Mirantis).

The post OpenStack and NVMe-over-Fabrics – Getting High Performance for Network Connected SSDs appeared first on Calsoft Inc. Blog.

The key to ‘elite’ DevOps success in 2018: Culture, cloud infrastructure, and abandoning caution

If you think your organisation has gotten serious about DevOps, the bad news is that a small group of companies are raising the bar higher than ever. But things can change depending on how you implement cloud infrastructure.

That is the primary finding from DORA (DevOps Research and Assessment Team) in its most recent State of DevOps report, which polled almost 1,900 global professionals and was put together primarily alongside Google Cloud, with a stellar cast list of secondary sponsors including Microsoft Azure, Amazon Web Services (AWS) and Deloitte.

There are various comparisons which can be made between this and Puppet’s State of DevOps report, released at similar times. Both reports are framed around comparing high and low performers; in this case, almost half (48%) are considered high performers compared with 37% and 15% for medium and low respectively.

Yet the DORA report for the first time issues a new group – ‘elite’ performers (7%). These companies naturally deploy multiple times per day, but they also take less than an hour to go from code commit to code running in production, as well as restoring their service in the event of failure. For comparison, those in the previous high performer category could take up to a week for changes, and up to a day for service restoration.

That gulping sound you just heard? IT engineers wondering how to push their companies to these super-high performance levels.

But fear not – there are steps organisations can take. First of all, try if you can to remove the shackles and become less cautious. For instance, CapitalOne says it deploys 50 times per day, while Google and Netflix – albeit throughout their hundreds of services across production environments – go into the thousands.

The report dives into a typically ‘conservative’ organisation’s mindset. “Releasing code infrequently can be an effective strategy as they use the extra time between deployments for testing and quality checks to minimise the likelihood of failure,” the report notes. “[Yet] developing software in increasingly complex systems is difficult and failure is inevitable.

“When failures occur, it can be difficult to understand what caused the problem and then restore service. Worse, deployments can cause cascading failures throughout the system. Those failures take a remarkably long time to fully recover from.

“While many organisations insist this common failure scenario won’t happen to them, when we look at the data, we see 5% of teams doing exactly this – and suffering the consequences.”

Examining further drivers of improvement, the report assessed cloud and multi-cloud usage. While AWS (52%) was the most popular provider, ahead of Azure (34%) and Google Cloud Platform (18%), two in five said they were using multiple cloud providers.

The key here is that companies who exhibit all signs of cloud readiness – on-demand self-service, broad network access, resource pooling, elasticity and measured service – are significantly more likely to be in the elite group instead of the lowest performers.

Being keen on implementing platform as a service to developers, as well as adopting infrastructure as code and open source policies, are also more likely to help. Take Capital One again as an example. It is not just the embracing of open source that is vital, but the culture which goes with it. This was an important part of the Puppet analysis. While it appears to be a slower process, for the highest performers, one in five said they had strong DevOps culture across multiple departments.

The report assesses this in comparison with Ron Westrum’s model of organisational cultures – the power, rule, and performance-oriented. “When teams have a good dynamic, their work benefits at the technology and organisational level,” the report notes. “Our research has confirmed this for several years and we caution organisations not to ignore the importance of their people and their culture in technology transformations.”

You can read the full report here (email required).

Read more: Puppet State of DevOps 2018: DevOps continues to evolve – but resist temptation to skip steps

Mac Computers in Your Organization: How to Find the Right Management Tool

Once IT and executives have decided it‘s time to subject the corporate Mac computers to adequate device administration, the issue is to find a suitable solution for just that. Admins looking for a tool to handle Mac computers in their organization need to consider a number of priorities. On the one hand, there are certain […]

The post Mac Computers in Your Organization: How to Find the Right Management Tool appeared first on Parallels Blog.

Why Unmanaged Mac Computers Can Cost You

Whether it’s due to bring-your-own-device (BYOD) policies, departments going rogue, or a superior’s whim, IT needs a solution for integrating already existing or to-be-deployed Mac® computers into an organization’s infrastructure. Even smaller companies regulate their PC clients using group policies and roll out smartphones via mobile device management—but in many cases, Mac devices are left […]

The post Why Unmanaged Mac Computers Can Cost You appeared first on Parallels Blog.

Parallels to Attend MacSysAdmin in Göteborg, Sweden

Guest blog post by Ian Appleby, Northern Europe Territory Manager for Cross Platform Solutions at Parallels Here at Parallels, we’re getting very excited about attending MacSysAdmin 2018 from October 2-5, 2018. Four days in Göteborg of pure Mac® goodness. Four days in the company of Mac admins, experts, and geeks. Four days to be reminded that—in […]

The post Parallels to Attend MacSysAdmin in Göteborg, Sweden appeared first on Parallels Blog.

Forbes Cloud 100 2018: Stripe holds off Slack to retain top private cloud title

Payments provider Stripe remains the number one privately owned cloud company ahead of social messaging firm Slack, according to the latest rankings from Forbes.

The media firm’s latest Cloud 100 list, celebrating the best private cloud firms – as in, cloud companies who are private – saw Stripe retain top spot with what Forbes calls ‘the online tool kit for digital payments, helping billions in transactions flow back into the economy.’

After Stripe and Slack however – which was third last year – there are significant changes at the top table. Dropbox, DocuSign and Adyen, which all made the top five in 2017’s list, have all since gone public. This publication noted in March, when Dropbox had filed for IPO, that the company had moved away from Amazon Web Services to its own infrastructure – a particularly long process.

The Cloud 100 was put together alongside Salesforce Ventures and Bessemer Venture Partners. The latter, perhaps not uncoincidentally, also produces a yearly report focusing on cloud and enterprise M&A trends. 2017’s most prominent IPO-ers were Cloudera, MongoDB, and Okta – an improvement on the previous year but still below historical averages.

It is too early however to see the VC firm’s primary prediction for this year to bear fruit. The keynote of Bessemer’s State of the Cloud report was that serverless, APIs, and blockchain would shape the cloud landscape in 2018 and beyond. It will take a while however for those biggest players to infiltrate the wider landscape – as the Forbes 100 list continues to be dominated by SaaS firms.

Yet the rise of artificial intelligence (AI) is notable. Among the more interesting companies in this year’s crop are UiPath, Darktrace and Checkr. UiPath, a new entry at #14, is a robotic process automation vendor based in New York, with Forbes admitting the company, with 1,350 companies, “absolutely came out of left field.” San Francisco-based Checkr (#47), meanwhile aims to provide a solution for background checks utilising AI, to better classify records without threatening compliance.

Cybersecurity provider Darktrace (#36), whose team is led by former US and UK researchers and government agents, is one of only two companies holding Britain’s end up in the list. The launch of cyber-AI tool Antigena last year was met with reasonable fanfare; as sister publication IoT News put it at the time, using AI for threat monitoring offers “tangible benefits”, picking up on threats and reacting to them without the need for manual action.

It is worth noting the influence being named on the Cloud 100 holds. Over the past three years, from MuleSoft to Cloudera and many more in between, every major cloud IPO or acquisition has meant a company on the 100 list departs. The publishing of this year’s list will lead some in the industry to ponder over future trajectories. Slack, who topped the list in 2016 before being usurped by Stripe for the past two editions, recently announced series H funding – to put this in perspective, the ‘record’ is series J, which big data firm Palantir Technologies took in 2014 – of $427 million, taking the company’s value to more than $7 billion.

“The 2018 Cloud 100 represents well over $135 billion in private shareholder value – an astonishing figure that reminds us yet again of the power of the cloud,” said Byron Deeter, partner at Bessemer Venture Partners. “The way we do business will be dramatically different as a result of these companies and I am honoured to celebrate the remarkable accomplishments of the founders and teams behind each company on the 2018 Cloud 100.”

The top 10 companies, in descending order, are Stripe, Slack, Zoom Video Communications, Tanium, Procore Technologies, CrowdStrike, Qualtrics, Squarespace, Elastic, and Eventbrite. Take a look at the full Forbes Cloud 100 list here and compare with 2016 and 2017’s verdicts.

How the Cloud Security Alliance Cloud Controls Matrix benefits financial institutions

The self-service and dynamic nature of cloud infrastructure creates challenges for risk and compliance professionals. Tools that worked well in the traditional data centre do not translate to the public cloud.  

Due to these concerns over regulatory compliance and security, as well as the complexity involved in replacing legacy systems, financial institutions are taking a more tentative approach to change – especially when it comes to implementing new technologies that could put compliance at risk.

So how can today’s financial service organisations embrace the many benefits of the cloud without opening up a Pandora’s box of risk relative to compliance and security?

Cloud native frameworks

One way that innovative financial service organisations are addressing this issue is by introducing cloud native frameworks to govern the cloud. The major cloud providers have been hard at work to ensure that there is a fundamental infrastructure for compliance in place, and new tools are available to ensure that the parameters are being followed and that financial institutions are in compliance.

Let’s explore one of these common frameworks and how it maps to the cloud.

Cloud Security Alliance Cloud Controls Matrix (CSA CCM)

The Cloud Security Alliance Cloud Controls Matrix (CSA CCM) framework provides fundamental security principles to guide cloud vendors and assist prospective cloud customers in determining the overall security risk of a cloud provider. The CSA CCM provides a controls framework with a detailed explanation of security concepts and principles that are aligned to the Cloud Security Alliance guidance in 13 domains.

As a framework, the CSA CCM provides organisations with the needed structure, detail, and clarity relating to information security tailored to the cloud industry.  It has also become the generally agreed upon standard of US-based financial services companies on how they will govern their use of the cloud.   Many financial institutions use the CSA CCM because it encompasses multiple security frameworks across multiple organisations and allows them to look at their legacy frameworks and determine which portions are covered.

The CSA CCM strengthens existing information security control environments in a number of ways:

  • It emphasises business information security control requirements;
  • It reduces and identifies consistent security threats and vulnerabilities in the cloud;
  • It provides standardised security and operational risk management; and
  • It seeks to normalise security expectations, cloud taxonomy and terminology, and security measures implemented in the cloud.

One reason it is such a powerful resource is that if you are compliant in one area, it can provide validation that you are compliant with numerous related frameworks. 

For example, the control ID – DIS-03 under the CCM Domain – data security and lifecycle management for eCommerce transactions, requires data related to e-commerce that traverses public networks to be appropriately classified and protected from fraudulent activity, unauthorised disclosure, or modification in such a manner to prevent contract dispute and compromise of data.  If an organisation is in compliance with DIS-03 there is a direct correlation with NIST 800-53 which addresses these same security requirements with controls including:

  • AC-14: Permitting actions without identification or authentication
  • AC-21: Information sharing
  • AC-22: Public Accessible content
  • IA-8: Identification and Authentication (Non-organisational users)
  • AU-10: Non-Repudiation
  • SC-4: Information in shared resources
  • SC-8: Transmission confidentiality and integrity
  • SC-9: Transmission confidentiality

Many financial institutions use the CSA CCM because it encompasses multiple security frameworks across multiple organisations and allows them to look at their legacy frameworks and determine which portions are covered.

CSA CCM and cloud management platforms

CSA CCM has directives AIS-04, BCR-07, BCR-10, BCR-11, IAM-01, IAM-12, IVS-01, and IVS-03.  All of these require that you have Global API Accounting Configured so that it records API calls for your account and delivers log files to you. The recorded information includes the identity of the API caller, the time of the API call, the source IP address of the API caller, the request parameters, and the response elements returned by the specific cloud service. Global API Accounting provides a history of API calls for each account, including API calls made via the management console, SDKs, command line tools, and other cloud services.  Without this, you are in violation of CSA CCM. With a cloud management platform, users can build an automation to remediate. For example, in AWS, this would mean the cloud management platform would use the API to write credentials to turn on AWS CloudTrail for the resource in question.

Embracing cloud automation

The ability to automate the enforcement of best practices and standards will be a game changer for the financial services industry. Cloud automation tools provide organisations with continuous compliance and the ability to take the burden off of the IT department by automatically monitoring applications and identifying and fixing issues on the fly. They continuously scan the virtual infrastructure, identify non-compliant resources and remediate common cloud problems related to security, cost and compliance.

As financial institutions look to reinvent their IT organisations, they must ensure that security, governance and compliance is at the foundation of all decisions.  Regulatory compliance and managing cyber risk do not need to be the enemy of innovation. For such a regulated industry, automated cloud services and frameworks can help financial service organisations advance IT innovation

Datrium secures $60m series D funding to go beyond hyperconverged infrastructure

Datrium, a California-based hybrid cloud infrastructure provider, has raised $60 million (£45.8m) in series D funding, with the aim of helping enterprises ‘overcome major obstacles in data analysis and storage.’

The round was led by Samsung Catalyst Fund, as well as featuring new participation from Icon Ventures. NEA and Lightspeed Venture Partners – who regular readers of this publication would recognise as investors in Netskope, CloudBees and Zscaler among others over the years – also participated in the oversubscribed round.

The company’s primary offerings are based around its DVX product, for cloud and on-premises which promises 10 times the speed and scale of legacy hyperconverged infrastructure, as well as cloud backup and cloud disaster recovery orchestration.

Datium claims it is pioneering the area of 2-layer infrastructure, which represents a step up from traditional hyperconverged infrastructure. As CEO Tim Page recently put it to The Silicon Review, the company provides ‘a single management interface across enterprise data centres and public cloud so IT can administer the hybrid cloud at the virtual machine level supported by real-time analytics and without all the detailed configuration time of traditional data centre infrastructure.’

Customers include Fortune 100 companies across industries such as financial services, healthcare, manufacturing and entertainment.

“We are thrilled to partner with Samsung and Icon Ventures to expand our technical and geographical momentum,” Page said in a statement. “Enterprises globally have the same problems in simplifying compute and data management across on-prem and cloud. Where SANs don’t even have a path to cloud, traditional HCI has too many trade-offs for core data centres – backup requires separate purchasing and administration, and cloud DR automation is seldom guaranteed. Larger enterprises are realising that Datrium software offers them a simpler path.”

The data centre landscape continues to change. Hyperscalers are ruling the roost, with capex continuing to rise. Cloud leads the way – Cisco said in February that cloud traffic will represent 95% of total data centre traffic by 2021 – so it’s a race against time for organisations trying to build through their legacy stacks with one hand while driving towards cloud with the other.

Total funding for Datrium now stands at $170 million.

Puppet State of DevOps 2018: DevOps continues to evolve – but resist temptation to skip steps

There are many paths to success in DevOps, but many more which lead to failure – so it’s important to get the evolution right.

That’s the key finding from Puppet’s recently released 2018 State of DevOps report. The study, which quizzed more than 3,000 global technology professionals, argues there are five key stages for good DevOps practices; having built the foundation, normalise the technology stack, standardise and reduce variability, before expanding DevOps practices, automating infrastructure delivery, and provide self-service capabilities.

Sounds simple, doesn’t it? Yet comparatively few of the companies surveyed were hitting the heights of DevOps-friendliness. The report’s results were based on organisations’ responses to various practices, scored between one and five. These were then grouped into low, medium and highly evolved. Four in five (79%) respondents were categorised as medium, with low and high (10% and 11% respectively) on similar levels.

Despite the desire to get to a higher level of DevOps zen, it is a slow evolutionary process. For the majority of companies polled, in the medium bracket, 14% said they had strong DevOps culture across multiple departments or across a single department. For higher level players, these numbers change to 19% and 9%.

It’s a similar process with automation – indeed, the same number of low-level and high-level companies surveyed (8%) said most of their services were available via self-service. Yet while only 15% of low players said their teams collaborated to automate services for broad use, this number rises for higher players to 37%. “Past experience has shown us that the path from a low degree of IT automation to a high degree isn’t neat or linear,” the report notes.

The report argues that automation is a reasonable yardstick on the CAMS – culture, automation, measurement and sharing – DevOps framework model as it is easily understood by the technical side and has a relatively predictable path. Culture, meanwhile, is more difficult to pin down.

Assuming the foundations have been built around setting company culture, automation et al, step one for teams looking to drive DevOps forward is to reduce the complexity of their tech stack. This means, for new projects, building on set standards, as well as making source code available to other teams. Standardisation follows, which again advocates building on a standard set of technology, as well as a standard operating system, while expansion explores reusing deployment patterns for building apps and services.

The report advises against skipping a few of the earlier steps. “Anecdotally speaking, we have seen organisations start with stage four automation, without having been through normalisation, standardisation and expansion,” it explains. “These organisations do not achieve success – and we believe it’s because they lack a foundation of collaboration and sharing across team boundaries.

“That sharing is critical to defining the problems an organisation faces and coming up with solutions that work for all teams.”

Ultimately, it’s about working at one’s own pace, and getting the building blocks firmly in place for many organisations reading the report.

“While DevOps practices have become far more well known across our industry, organisations continue to struggle to scale pockets of DevOps success more broadly across multiple teams and departments,” said Nigel Kersten, Puppet VP of ecosystem engineering. “This year’s report explores the foundational practices that need to be in place in order to scale DevOps success, and proves that success can only scale when teams are enabled to work across functional boundaries.”

You can read the full report here (email required).

Azure post-mortems, RTOs and RPOs – and what to do with Hurricane Florence on the horizon

The first official post-mortems are starting to come out of Microsoft in regards to the Azure outage that happened last week. While this first post-mortem addresses the Azure DevOps outage specifically (previously known as Visual Studio Team Service, or VSTS), it gives us some additional insight into the breadth and depth of the outage, confirms the cause of the outage, and gives us some insight into the challenges Microsoft faced in getting things back online quickly. It also hints at some some features/functionality Microsoft may consider pursuing to handle this situation better in the future.

As I mentioned in my previous article, features such as the new Availability Zones being rolled out in Azure, might have minimized the impact of this outage. In the post-mortem, Microsoft confirms what I previously said.

The primary solution we are pursuing to improve handling datacenter failures is Availability Zones, and we are exploring the feasibility of asynchronous replication.

Until Availability Zones are rolled out across more regions the only disaster recovery options you have are cross-region, hybrid-cloud or even cross-cloud asynchronous replication. Software based #SANless clustering solutions available today will enable such configurations, providing a very robust RTO and RPO, even when replicating great distances.

When you use SaaS/PaaS solutions you are really depending on the Cloud Service Provider (CSPs) to have an iron clad HA/DR solution in place. In this case, it seems as if a pretty significant deficiency was exposed and we can only hope that it leads all CSPs to take a hard look at their SaaS/PaaS offerings and address any HA/DR gaps that might exist. Until then, it is incumbent upon the consumer to understand the risks and do what they can to mitigate the risks of extended outages, or just choose not to use PaaS/SaaS until the risks are addressed.

The post-mortem really gets to the root of the issue…what do you value more, RTO or RPO?

I fundamentally do not want to decide for customers whether or not to accept data loss. I’ve had customers tell me they would take data loss to get a large team productive again quickly, and other customers have told me they do not want any data loss and would wait on recovery for however long that took.

It will be impossible for a CSP to make that decision for a customer. I can’t see a CSP ever deciding to lose customer data, unless the original data is just completely lost and unrecoverable. In that case, a near real-time async replica is about as good as you are going to get in terms of RPO in an unexpected failure.

However, was this outage really unexpected and without warning? Modern satellite imagery and improvements in weather forecasting probably gave fair warning that there was going to be significant weather related events in the area.

With hurricane Florence bearing down on the Southeast US as I write this post, I certainly hope if your data center is in the path of the hurricane you are taking proactive measures to gracefully move your workloads out of the impacted region. The benefit of a proactive disaster recovery vs a reactive disaster recovery are numerous, including no data loss, ample time to address unexpected issues, and managing human resources such that employees can worry about taking care of their families, rather than spending the night at a keyboard trying to put the pieces back together again.

Again, enacting a proactive disaster recovery would be a hard decision for a CSP to make on behalf of all their customers, as planned migrations across regions will incur some amount of downtime. This decision will have to be put in the hands of the customer.

Hurricane Florence Satellite Image taken from the new GOES-16 Satellite, courtesy of Tropical Tidbits

So what can you do to protect your business critical applications and data? As I discussed in my previous article, cross-region, cross-cloud or hybrid-cloud models with software based #SANless cluster solutions are going to go a long way to address your HA/DR concerns, with an excellent RTO and RPO for cloud based IaaS deployments. Instead of application specific solutions, software based, block level volume replication solutions such SIOS DataKeeper and SIOS Protection Suite replicate all data, providing a data protection solution for both Linux and Windows platforms.

My oldest son just started his undergrad degree in Meteorology at Rutgers University. Can you imagine a day when artificial intelligence (AI) and machine learning (ML) will be used to consume weather related data from NOAA to trigger a planned disaster recovery migration, two days before the storm strikes? I think I just found a perfect topic for his Master’s thesis. Or better yet, have him and his smart friends at the WeatherWatcher LLC get funding for a tech startup that applies AI and ML to weather related data to control proactive disaster recovery events.

I think we are just at the cusp of  IT analytics solutions that apply advanced machine-learning technology to cut the time and effort you need to ensure delivery of your critical application services. SIOS iQ is one of the solutions leading the way in that field.

Batten down the hatches and get ready, Hurricane season is just starting and we are already in for a wild ride.