How to minimise the risk of outages – with better software testing

The software as a service model has been widely embraced as digital transformation becomes the norm. But with it comes the risk of network outages. IDC has estimated that for the Fortune 1000, the average total cost of unplanned application downtime per year can range from $1.25 to $2.25 billion. This risk arises primarily from the rapid iteration of the DevOps methodology and the subsequent testing shortfalls.

To protect against certain errors and bugs in software, a new and streamlined approach to software testing is in order.

The DevOps/downtime connection

Development and testing cycles are much different than they used to be due to adoption of DevOps methodology. To remain competitive, software developers must continually release new application features. They’re sometimes pushing out code updates as fast as they are writing them. This is a significant change from how software and dev teams traditionally operated. It used to be that teams could test for months, but these sped-up development cycles require testing in days or even hours. This shortened timeframe means that bugs and problems are sometimes pushed through without the testing required, potentially leading to network downtime.

Adding to these challenges, a variety of third-party components must be maintained in a way that balances two opposing forces: changes to a software component may introduce unexplained changes in the behavior of a network service, but failing to update components regularly can expose the software to flaws that could impact security or availability.

Testing shortcomings

It’s pricy to deal with rollbacks and downtime caused by bugs. It typically costs four to five times as much to fix a software bug after release as it does to fix it during the design process. The average cost of network downtime is around $5,600 per minute, according to Gartner analysts.

Financial losses are a problem, but there’s more to be lost here. There’s also the loss of productivity that occurs when your employees are unable to do their work because of an outage. There are the recovery costs of determining what caused the outage and then fixing it. And on top of all of that, there’s also the risk of brand damage wreaked by angry customers who expect your service to be up and working for them at all times. And why shouldn’t they be angry? You promised them a certain level of service, and this downtime has broken their trust.

And there’s another wrinkle. Software bugs cause issues when they are released, but they can also lead to security issues further down the road. These flaws can be exploited later, particularly if they weren’t detected early on. The massive Equifax breach, in which the credentials of more than 140 million Americans were compromised,  and the Heartbleed bug are just two examples. In the case of the Heartbleed bug, a vulnerability in the OpenSSL library caused significant potential for exploitation by bad actors.

Developers make changes to the code that trigger a pipeline of automated tests in this environment of continuous integration and delivery. The code then gets approved and pushed into production. A staged rollout begins, which allows new changes to be pushed out quickly. But this also relies heavily on the automated test infrastructure.

This is hazardous, since automated tests are looking for specific issues, but they can’t know everything that could possibly go wrong. So then, things go wrong in production. The recent Microsoft Azure outage and Cloudflare’s Cloudbleed vulnerability are examples of how this process can go astray and lead to availability and security consequences.

A new way to test

A solution to the shortcomings of current testing methods would find potential bugs and security concerns prior to release, with speed and precision and without the need to roll back or stage. By simultaneously running live user traffic against the current software version and the proposed upgrade, users would see only the results generated by the current production software unaffected by any flaws in the proposed upgrade. Meanwhile, administrators would be able to see how the old and new configurations respond to actual usage.

This would allow teams to keep costs down, while also ensuring both quality and security, and the ability to meet delivery deadlines – which ultimately helps boost return-on-investment. For the development community, building and migrating application stacks to container and virtual environments would become more transparent during development and more secure and available in production when testing and phasing in new software.

Working with production traffic to test software updates lets teams verify upgrades and patches in a real-world scenario. They are able to quickly report on differences in software versions, including content, metadata and application behavior and performance. It becomes possible to investigate and debug issues faster using packet capture and logging. Upgrades of commercial software are easier because risk is reduced.

Toward quality releases

Application downtime is expensive, and it’s all the more painful when it’s discovered that the source is an unforeseen bug or security vulnerability. Testing software updates in production overcomes this issue by finding issues as versions are compared side by side. This method will save development teams time, headaches and rework while enabling the release of a quality product.

https://www.cybersecuritycloudexpo.com/wp-content/uploads/2018/09/cyber-security-world-series-1.pngInterested in hearing industry leaders discuss subjects like this and sharing their experiences and use-cases? Attend the Cyber Security & Cloud Expo World Series with upcoming events in Silicon Valley, London and Amsterdam to learn more.