All posts by petejohnson

Why Kubernetes networking is hard – and what you can do about it

History tells us that networking is always the last piece of the puzzle. From mainframes, to virtual machines, and now with containers, providing compute and storage are the first two steps before the realisation sets in that all these little entities have to all communicate with each other.

For most of the last 30 years, that communication has been facilitated over Ethernet where different ends of a communication at the application layer bind to each other using IP addresses and port numbers. But when those compute pieces shrink to container size, does that necessarily make sense anymore? 

If two containers are sitting on the same physical machine, on the same hypervisor, on the same Docker instance, do you really need to jump all the way out to the NIC to facilitate communication between them?  Does the application layer addressing stay the same?  Is it better to facilitate that communication using an overlay?  Do you do it over L2 or L3?  What about multi-tenancy?

All these questions, and more, is why Kubernetes networking is hard.

Kubernetes networking basics

Before understanding Kubernetes basics, it is useful to understand the limitations of Docker networking that Kubernetes overcomes.  This is not to say Docker networking is inherently evil, it’s just that the scope of the container engine tends to be a single physical or virtual machine so naturally that perspective runs into issues when considering a cluster of container engines that may or may not be spread across multiple physical or virtual machines.

The “Docker model”, as it is known in Kubrentes circles, uses host-private networking by default that creates a virtual bridge and a series of mappings that makes it easy for containers to talk to each other on the same machine.  However, containers on different machines require port allocations and forwards or proxies in order to communicate with each other.

As applications grow in size and utilise a microservices-based application architecture that requires many dozens if not many hundreds of containers spread across multiple machines, this does not scale well.  And, again, to be fair this networking scheme was intended to run on a single machine and it does support a CNM model that enables mulit-host networking but given its original intent it should not be surprising that it struggles with clustering.

The “Kubernetes model” had to not only solve the core clustering issue, but do so in a way that allowed for multiple implementations for different situations and be backward compatible with single node perspectives as well.  The fundamentals of this model are that all containers and nodes can communicate with each other without NAT and the IP address that a container sees itself as is the same IP address that others see it as.

The basic definition of a pod in Kubernetes terminology is that it is “a group of one or more containers with shared storage/network, and a specification for how to run the container.”

So, when containers are within the same pod, they share the same IP and port space and are reachable to each other using localhost.  This satisfies the backward compatibility design goal for single container engine perspectives.

More commonly, though, microservices within an application run in different pods, so they have to discover and reach each other in more complex ways than simply referring to localhost.  This mechanism is abstracted in Kubernetes so that a variety of implementations are possible, but the most popular ones use overlay, underlay, or native L3 approaches.

An overlay approach uses a virtual network that is decoupled from the underlying physical network using some sort of tunnel.  Pods on this virtual network can easily find each other and these L2 networks can be isolated from one another, requiring L3 routing between them when necessary.

An underlay approach attaches an L2 network to the node’s physical NIC, exposing the pod directly to the underlying physical network without port mapping.  Bridge mode can be used here to enable pods to internally interconnect so that the traffic does not leave the host when it doesn’t have to.

A native L3 approach contains no overlays on the data plane, meaning that pod-to-pod communications happen over IP addresses leveraging routing decisions made by node hosts and external network routers.  Pod-to-pod communication can utilize BGP peering to not leave the host and NAT can be used for outgoing traffic if necessary.

The needs and scale of your applications, including what other resources it might need to consume outside the cluster, will guide which networking approach is right for you and each approach has a variety of open source and commercial implementation alternatives.

But Kubernetes is not operating in a vacuum

Rarely does a Kubernetes cluster get deployed in a purely greenfield environment.  Instead, it gets deployed in support of rapid iteration efforts a line-of-business development team is working on to inject innovation into a market alongside existing services in an enterprise that exists on VMs or physical machines. 

As an example – shown here on the right when choosing an overlay approach – should a container on a VM host need to talk to a service elsewhere on the physical network, it now has multiple layers to jump through, each of which may inject different amounts of latency that can degrade performance.  Because these microservices-based applications do not often operate in a vacuum, this needs to be carefully considered when choosing an approach and an implementation and the choice made for one application may differ from that of another in the same portfolio of managed applications.

Why policy-based Kubernetes networking management makes sense

Developers love microservices because it enables them to architect solutions with smaller, more isolated components that talk to each other over APIs.  The APIs act as contracts between the components so as long as those APIs do not change, the components can be deployed independent of one another, making it easier to release more quickly as the search for innovative change iterates over time.

But just like all other underlying infrastructure management, this creates management headaches due the increased complexity that makes those Kubernetes clusters all hum along efficiently.  How many nodes should your cluster have?  What happens when you change your mind later?  How can you manage one cluster that uses overlay networking while another uses native L3 side by side because the multiple applications running on them have slightly different needs?  What governance do you put in place to keep it all consistent and secure?

These questions, and more, will confront a team managing Kubernetes clusters and the pathway to the answers comes from the same form of aspirin that helps soothe other infrastructure management headaches: policy.

Administrators discovered while managing software-defined networks and the virtual machines that sit on top of them, the scale of the number of “things” to be managed manually becomes unsustainable at some point.  With Kubernetes cluster administration, the number of “things” to be managed grows substantially and manual intervention becomes equally unsustainable in this new container cluster universe.  Automating administration and enforcing best practices through policy-based management becomes a clear choice regardless of what specific approaches to Kubernetes networking might be made for individual applications.  Nothing else scales to the task.

So, for the growing list of microservices-based applications you are probably managing and regardless of whether those applications need overlay, underlay, or native L3 networks be sure whatever implementation you choose provides you the option of managing your Kubernetes cluster networking via policy using the appropriate plug-in.  Otherwise, implementing changes and maintaining consistency among clusters will quickly become impossible. But by managing intent with policy automation, you’ll be ready for whatever your applications need.

Read more: Kubernetes takes step up as it ‘graduates’ from Cloud Native Computing Foundation

Consuming public cloud services on-premise: A guide

As we enter our second decade of its existence, the role of the public cloud is changing. Infrastructure as a service altered the way that virtualised resources are consumed, but what has emerged is far more powerful than allocating compute, storage, and networking on-demand.

The derivative services that the public cloud providers now offer include speech-to-text, sentiment analysis, and machine learning functionality that are constantly being improved. While it is often prudent to run an application on virtual machines or a container cluster on-premises for cost, security, or data gravity reasons, this new breed of public cloud services can often be used in a stateless manner that enables them to be utilised no matter where the business logic for an application resides.

How are on-prem applications utilising these services today and how can that usage evolve over time to work at scale?

Common usage today

Today, application code has to be bound to a specific instance of a public cloud service in order for the interaction between the code and the service to work correctly. Typically, that binding involves standing up an instance of the service using the public cloud console, granting access to a particular user with a particular set of security authorisations, making access keys for that user available to the developer, who then has to embed references to both the access keys and the service instance those keys grant access to.

Here’s an example of that from the developer perspective using a Kubernetes-based application on-prem to connect to the Google Natural Language API. First, the deployment.yaml file that describes how the front end component of our application should be deployed:

The key portion for this discussion is at the bottom where a volume is mounted so that the launched containers can access the local disk that contains the access keys and both the access keys location (GOOGLE_APPLICATION_CREDENTIALS) and project ID for pointing to the correct instance of the service (GOOGLE_PROJECT_ID, where the value is blurred) are injected into the container as environment variables.

In the front-end Python code put into this container, first it must create an instance of the natural language object that is part of the Google Python client library:

Here, there is a specific reference being made to that project ID and the client library is smart enough to look for the access keys location in the aforementioned environment variable. At this point, the client library can be used to do things like measure the sentiment of an input string:

Needless to say, this process is both cumbersome and fragile. What if the volume breaks and the code cannot get to the access keys? How about typos of the project IDs? What happens if you want to change either one?

This is complicated in aggregate across an application portfolio that would otherwise have to do this individually for every public cloud service. Hard-coding project IDs is subject to human error and rotating access keys – to ensure better security of the public cloud service consumption – forces a new deployment. Usage metrics are locked inside the individual accounts from which the project IDs are generated, making it difficult for anyone to get a real sense of public cloud service usage across multiple applications.

A better future

What is a better way to tackle this problem so that developers can create applications that get deployed on-prem, but can still take advantage of public cloud services that would be difficult to replicate? Catalog and brokering tools are emerging that remove many of the steps described above by consolidating public cloud service access into a single interface that is orthogonal to the developer view of the world. Instead of a developer baking in key access and project IDs into the deployment process, the IT ops staff is able to provide a container cluster environment that injects the necessary information. This simplifies deployments for the developer and provides a single place to collect aggregate metrics.

For example, here is a screenshot from a catalog tool where an IT ops admin can create an instance of a pub/sub service (left), before creating a binding (right) for that service to be used by an individual application:

The code required to complete the binding is simpler than the previous example (shown in Node.js):

By removing the need to inject binding-necessary information during the deployment process and instead having it handled by the environment itself, public cloud services can be reused by providing multiple application bindings to the same service. Access keys can be rotated in-memory so that security can be improved without forcing a deployment. Usage flows through a single point, making metrics collection much easier.

In summary

Certain public cloud services, especially those involving large AI datasets like natural language processing or image analysis, are difficult if not impossible to replicate on-prem. Increasingly, though, users expect applications to contain features based on these services. The trick for any developer or enterprise is to find a way to streamline access to these services across an application portfolio in a way that makes the individual applications more secure, more resilient, and provide more useful usage metrics.

Current techniques of binding applications to these public cloud services prevent this – but a set of catalog and brokering tools are emerging that make this far easier to deliver on these promises that customers demand.