Winning the IT availability war: How to combat costly downtime

Analysts predict global enterprises will spend nearly $2 trillion on digital transformation by 20221. With digital initiatives and technology becoming ubiquitous with business today, one would think that companies would be more than ready for a world where virtually every touchpoint with customers is digital. Unfortunately, the fact that Target, British Airways, Facebook and Twitter all experienced major IT outages in 2019 suggests there is still work left to do to keep services and an optimal customer experience up and running smoothly.

To explore precisely what enterprises are doing to detect, mitigate and hopefully prevent outages LogicMonitor commissioned an IT Outage Impact Study. The independent study surveyed 300 IT decision makers at organisations in the US, Canada, UK, Australia and New Zealand to discover whether or not IT leadership is concerned about “keeping the lights on” for their businesses. The research revealed a stark reality at odds with today’s omnipresent digitisation: IT teams are concerned about their ability to avoid costly outages, mitigate downtime, and reliably provide the 24/7 availability that customers and partners demand.

Are outages inevitable?

IT teams worldwide agree on two things: performance and availability are top priorities for their department. These two mission-critical priorities, in fact, beat out security and cost, which is surprising considering how much attention security gets in today’s data-breach heavy environment.

Yet IT’s intense focus on keeping the network up and running at peak performance has not prevented downtime. In fact, 96% of survey respondents report experiencing at least one IT outage in the past three years, which is bad news if performance and availability are considered make or break areas for modern organisations.

Common causes of downtime include network failure, surges in usage, human error, software malfunction and infrastructure that fails. What is surprising, however, is that enterprises report that more than half of the downtime they experience could have been prevented.

Worryingly, IT decision makers are pessimistic when it comes to their ability to influence all-important availability. More than half (53%) of the 300 IT professionals surveyed say they expect to experience a brownout or outage so severe that the national media will cover the story, and the same percentage said someone in their organisation will lose his or her job as a result of a severe outage.

This begs the question: if even the most skilled technical experts in IT can’t prevent outages, who (or what) can?

The true costs of downtime

Negative media coverage and career impacts aside, downtime comes with additional costs for organisations. Survey respondents identify lost revenue, lost productivity and compliance-related costs as other factors associated with IT outages and brownouts (periods of dramatically reduced or slowed service). And these costs add-up quickly. Organisations with frequent outages and brownouts experience:

16 times higher costs associated with mitigating downtime than organisations with few or zero outages
Nearly two times the number of team members to troubleshoot problems related to downtime
Two times as long to troubleshoot problems related to downtime

How to win the availability war

If more than half of outages and brownouts are avoidable, according to 300 global IT experts, then every organisation should be taking proactive steps to prevent these disruptive events. The best-performing organisations are already working to prevent costly downtime. Consider taking the following actions to do the same:

Embrace comprehensive monitoring. In today’s digital world, many companies operate in a hybrid IT environment with infrastructure both on-premises and in the cloud. Trying to spot trends using siloed monitoring tools for each platform is inefficient and prone to error.
Identify and implement software that comprehensively monitors infrastructures, allowing the team to view IT systems through a single pane of glass. Consider extensibility and scalability during the selection process as well to ensure the platform integrates with all technologies – present and future
Use a monitoring solution that provides early visibility into trends that could signify trouble ahead. Data forecasting can proactively identify future failures and ultimately prevent an outage before it impacts the business. Teams should build a high level of redundancy into their monitoring systems as an additional method to prevent downtime and focus on eliminating single points of failure that might cause a system to go down
Don’t wait to create an IT outage response plan. Hopefully it will never be needed, but it’s critical to have a defined process for handling outages from escalation and remediation to communication and root cause analysis. Set a plan on who to involve (and when) to ensure IT can respond quickly if an outage does occur

While the 2019 LogicMonitor’s 2019 Outage Impact Study revealed that downtime is surprisingly common, it also showed that top-performing organisations are able to banish downtime from their day-to-day operations through advanced planning and comprehensive monitoring software. In the end, it is possible to win the IT availability war, with the right combination of skilled team members and powerful SaaS monitoring technology. But every minute of downtime is pricey – so there’s no time to waste.

Interested in hearing industry leaders discuss subjects like this and sharing their experiences and use-cases? Attend the Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London and Amsterdam to learn more.

The cloud news categorized.