Downtime can prove a fiasco for any organisation, as can a sudden surge in demand, and it’s particularly true for companies wired into the heart of the online gaming scene. From EA Sports’ FIFA to the renowned Call of Duty franchise, millions of gamers across the globe have come to expect 24/7 network availability.
The growing demand for always-on services is akin to the way that organisations reliant on cloud-powered applications expect flawless and reliable connections on which to run their operations. Just look at the escalating COVID-19 pandemic that’s taken the cloud computing world by storm – with a surge in demand for data services, Wi-Fi networks and workplaces platforms like Microsoft Teams. The staggering work that goes into maintaining these networks as userbases swell, whether in the business or gaming worlds, is routinely overlooked; it’s often a case of missing crucial elements when things go wrong.
At games publisher Ubisoft, subsidiary i3D.net runs and maintains the networks that power widely-played AAA multiplayer games, like Tom Clancy’s The Division. While it had been successful managing with just 70 staff and servers based in 35 sites spanning 15 countries, in the mid-2010s, it became clear extra muscle was needed to continue to service a rapidly-swelling user base.
“The big thing in game hosting is the fact you need to be really flexible and very responsive to the fast-changing market,” i3D.net COO, Rick Sloot, tells Cloud Pro. “A game can be popular, or it can be a real flop. But as soon as the game is popular, and a lot of people are playing it, or maybe even more people are going to play it than you’re expecting, you need extra capacity within hours, or maybe, at most, in a matter of days.”
The pressures of an always-on world
In the past, i3D.net would factor networking issues as a business cost, but these started to become too frequent to sustain. The infrastructure was built to incorporate redundancy, though if any routers, switches or other equipment went down, i3D.net would be pressed to resolve these issues as soon as possible while game sessions across the world were put on hold. The firm sought to onboard a third-party network monitoring company in 2015 to bolster network resilience, once it became impossible to tolerate these problems. The need was especially pressing given how limited staffing levels were, combined with exponentially growing demand. Network management firm Opengear was recruited shortly before Ubisoft released its hotly-anticipated Tom Clancy’s The Division 2, to improve resilience and failover options should things get hairy.
“The way the 24/7 world is working currently, and everybody wants to be online 24/7, [network failure] was not an acceptable risk anymore,” Sloot continues. “Because the company, and everybody in the world, is demanding a 24/7 service, we needed to look for other solutions, and other ways of maintaining the flexibility but without adding a lot of overhead on us.”
The potential for demand to surge at any one time, and in any location across the world, was impractical given i3D.net would rely on its own network engineers to fly out to these sites should work need doing. Remote hands would be used where possible, but it would take crucial minutes or hours to establish a connection while networks were offline. Expansion at existing locations, or establishing new sites, also posed issues when demand for a game went “sky high”.
Going mobile
Opengear already formed a part of i3D.net’s infrastructure, but on a much smaller scale, Sloot says. The implementation phase, which spanned a year, involved heavily ramping up the company’s involvement, which, thanks to the existing relationship, was more straightforward than it could have been. The equipment was shipped to i3D.net, and its engineers spent the following year flying from location to location to install the infrastructure. As i3D.net harbours sufficient technical expertise, it primarily leant on Opengear for enhancements. Automatic failover to alternative networks, for example, would ensure games would continue running when things looked hairy. This operated through the installation of cellular friction, with communication running via 4G networks instead of traditional backup lines.
“Before, we would always try to have a backup line; for example, buy a backup line from a data centre and then connect that one. So this was a very good additional feature for us, which brought the service to a higher level,” Sloot continues. The implementation of cellular friction, however, brought its own challenges.
“Maybe sometimes for us, from our side, it’s tricky because for cellular friction you need good quality of signal … which is always a challenge in a data centre, which is always a highly secure facility.”
As for how he’d advise other businesses to handle their networking infrastructure as they look to scale, he repeated that you would only miss the most crucial elements powering your networks behind the scenes when things go horribly wrong.
“I always say to my guys here, what could be the worst that can happen?” he explains. “If you look at all those steps that could happen – what can you prevent, and if you can prevent them, what’s the best solution for it?
“If there’s a solution, what are the costs versus the risks? Looking at this particular solution of Opengear, the costs of not having a network is, like, tens of thousands of Euros per hour. Buying the product is a small fraction of that, so, it’s a rather small investment for achieving high availability.”