Why Kubernetes networking is hard – and what you can do about it

History tells us that networking is always the last piece of the puzzle. From mainframes, to virtual machines, and now with containers, providing compute and storage are the first two steps before the realisation sets in that all these little entities have to all communicate with each other.

For most of the last 30 years, that communication has been facilitated over Ethernet where different ends of a communication at the application layer bind to each other using IP addresses and port numbers. But when those compute pieces shrink to container size, does that necessarily make sense anymore? 

If two containers are sitting on the same physical machine, on the same hypervisor, on the same Docker instance, do you really need to jump all the way out to the NIC to facilitate communication between them?  Does the application layer addressing stay the same?  Is it better to facilitate that communication using an overlay?  Do you do it over L2 or L3?  What about multi-tenancy?

All these questions, and more, is why Kubernetes networking is hard.

Kubernetes networking basics

Before understanding Kubernetes basics, it is useful to understand the limitations of Docker networking that Kubernetes overcomes.  This is not to say Docker networking is inherently evil, it’s just that the scope of the container engine tends to be a single physical or virtual machine so naturally that perspective runs into issues when considering a cluster of container engines that may or may not be spread across multiple physical or virtual machines.

The “Docker model”, as it is known in Kubrentes circles, uses host-private networking by default that creates a virtual bridge and a series of mappings that makes it easy for containers to talk to each other on the same machine.  However, containers on different machines require port allocations and forwards or proxies in order to communicate with each other.

As applications grow in size and utilise a microservices-based application architecture that requires many dozens if not many hundreds of containers spread across multiple machines, this does not scale well.  And, again, to be fair this networking scheme was intended to run on a single machine and it does support a CNM model that enables mulit-host networking but given its original intent it should not be surprising that it struggles with clustering.

The “Kubernetes model” had to not only solve the core clustering issue, but do so in a way that allowed for multiple implementations for different situations and be backward compatible with single node perspectives as well.  The fundamentals of this model are that all containers and nodes can communicate with each other without NAT and the IP address that a container sees itself as is the same IP address that others see it as.

The basic definition of a pod in Kubernetes terminology is that it is “a group of one or more containers with shared storage/network, and a specification for how to run the container.”

So, when containers are within the same pod, they share the same IP and port space and are reachable to each other using localhost.  This satisfies the backward compatibility design goal for single container engine perspectives.

More commonly, though, microservices within an application run in different pods, so they have to discover and reach each other in more complex ways than simply referring to localhost.  This mechanism is abstracted in Kubernetes so that a variety of implementations are possible, but the most popular ones use overlay, underlay, or native L3 approaches.

An overlay approach uses a virtual network that is decoupled from the underlying physical network using some sort of tunnel.  Pods on this virtual network can easily find each other and these L2 networks can be isolated from one another, requiring L3 routing between them when necessary.

An underlay approach attaches an L2 network to the node’s physical NIC, exposing the pod directly to the underlying physical network without port mapping.  Bridge mode can be used here to enable pods to internally interconnect so that the traffic does not leave the host when it doesn’t have to.

A native L3 approach contains no overlays on the data plane, meaning that pod-to-pod communications happen over IP addresses leveraging routing decisions made by node hosts and external network routers.  Pod-to-pod communication can utilize BGP peering to not leave the host and NAT can be used for outgoing traffic if necessary.

The needs and scale of your applications, including what other resources it might need to consume outside the cluster, will guide which networking approach is right for you and each approach has a variety of open source and commercial implementation alternatives.

But Kubernetes is not operating in a vacuum

Rarely does a Kubernetes cluster get deployed in a purely greenfield environment.  Instead, it gets deployed in support of rapid iteration efforts a line-of-business development team is working on to inject innovation into a market alongside existing services in an enterprise that exists on VMs or physical machines. 

As an example – shown here on the right when choosing an overlay approach – should a container on a VM host need to talk to a service elsewhere on the physical network, it now has multiple layers to jump through, each of which may inject different amounts of latency that can degrade performance.  Because these microservices-based applications do not often operate in a vacuum, this needs to be carefully considered when choosing an approach and an implementation and the choice made for one application may differ from that of another in the same portfolio of managed applications.

Why policy-based Kubernetes networking management makes sense

Developers love microservices because it enables them to architect solutions with smaller, more isolated components that talk to each other over APIs.  The APIs act as contracts between the components so as long as those APIs do not change, the components can be deployed independent of one another, making it easier to release more quickly as the search for innovative change iterates over time.

But just like all other underlying infrastructure management, this creates management headaches due the increased complexity that makes those Kubernetes clusters all hum along efficiently.  How many nodes should your cluster have?  What happens when you change your mind later?  How can you manage one cluster that uses overlay networking while another uses native L3 side by side because the multiple applications running on them have slightly different needs?  What governance do you put in place to keep it all consistent and secure?

These questions, and more, will confront a team managing Kubernetes clusters and the pathway to the answers comes from the same form of aspirin that helps soothe other infrastructure management headaches: policy.

Administrators discovered while managing software-defined networks and the virtual machines that sit on top of them, the scale of the number of “things” to be managed manually becomes unsustainable at some point.  With Kubernetes cluster administration, the number of “things” to be managed grows substantially and manual intervention becomes equally unsustainable in this new container cluster universe.  Automating administration and enforcing best practices through policy-based management becomes a clear choice regardless of what specific approaches to Kubernetes networking might be made for individual applications.  Nothing else scales to the task.

So, for the growing list of microservices-based applications you are probably managing and regardless of whether those applications need overlay, underlay, or native L3 networks be sure whatever implementation you choose provides you the option of managing your Kubernetes cluster networking via policy using the appropriate plug-in.  Otherwise, implementing changes and maintaining consistency among clusters will quickly become impossible. But by managing intent with policy automation, you’ll be ready for whatever your applications need.

Read more: Kubernetes takes step up as it ‘graduates’ from Cloud Native Computing Foundation