The HitchHacker's Guide to the Kubernetes Galaxy
Contents
Kubernetes is many things, but one of the main things it is, especially to newcomers, is confusing. There have been many attempts to explain what exactly Kubernetes is ranging from “It’s docker, essentially” to “an open-source system for automating deployment, scaling, and management of containerized applications”, but nothing I’ve read has really seemed to capture the essence of what Kubernetes really does, or why it exists, or really what the whole point of this thing is, anyway.
The Kubernetes Documentation can be similarly inscrutable - is what I’m looking for in the Setup tab? Concepts? Tasks? Do I really need to learn everything via the automatically generated API documentation?
This is my attempt to explain Kubernetes from the perspective of a hacker: What is it, what can it do for me, and why should it be what I use to do the things that it can do?
Join me for a series of posts to help you understand Kubernetes
and build (and rebuild) your own cluster on bare metal. In contrast to
Kubernetes the Hard Way, which is a step by step checklist for
configuring everything about Kubernetes, and in contrast to
Minikube, which is primarily intended for development, we’ll
be using kubeadm
to bootstrap a “hacker homelab”
cluster on bare metal.
What Kubernetes is not
Kubernetes is not a magic tool to solve your scaling problems. Kubernetes itself brings zero “new features” to the world of containers and computing at large. Everything Kubernetes can do is something you’ve been able to do elsewhere and with other solutions.
Kubernetes allows you to achieve application autoscaling, container orchestration, service discovery, machine maintenance, and more. But these are not what Kubernetes is. These are applications of Kubernetes.
That said, Kubernetes does make a lot of these things easy - but by first making everything very, very complicated.
What Kubernetes is
Kubernetes is a platform and ecosystem all to it’s own. This on it’s own is not unique, there are many platforms and ecosystems. The unique thing about Kubernetes, to me, is that it is built to assist in completely abstracting details about machines away while providing uniform primitives to allow nearly all applications to run in a cluster in a seamless way. In essence Kubernetes is, from the perspective of applications running on it, either just a standard Linux environment, or a full cluster operating system, depending on their level of integration.
From the perspective of an ordinary application running in a Linux container Kubernetes is not that unique and just seems like yet another container runtime wrapper (such as Docker Compose). From the perspective of the cluster administrator, however, Kubernetes allows a uniform way to configure many disparate features that have typically been the purview of both operating systems and supporting applications such as networking, storage, service discovery, load balancing, and more.
In short, from a Hacker’s perspective, Kubernetes is really a cluster1 API - or CPI if you will (but please, I hope you won’t). The API is built around various objects to represent both desired and actual cluster state. Kubernetes assumes that there will be controllers to manage the in-cluster object lifecycles, and provides a few “out of the box”.2
To make this a bit more concrete, Kubernetes is several things:
- A collection of resource types
- A collection of Linux processes implementing control loops for those resources
- A standardized API for accessing and mutating those resources
- A standardized CLI for accessing and mutating those resources
- A standardized textual file format for describing those resources.
The desired and actual state of a running cluster is stored within etcd, a fault-tolerant key-value store, while the operator managed ‘source of truth’ of the desired state of the cluster is typically stored as a series of text files in a version control system.
To maintain consistency within the Kubernetes ecosystem at large Kubernetes also allows one to define new Kubernetes custom resources via custom resource definitions, though this is typically not of interest to most Kubernetes operators who are likely to only indirectly use this feature by installing cluster addons.
Glossary
As a large and complex project Kubernetes has many nouns. Most of these are generally things you’ll be familiar with, but it would be good to quickly check in.
Generic terms
- container
- A group of technologies that work together to provide an isolated or semi-isolated environment for running a group of processes. Notably this may not include privilege isolation.
- application container
- A container meant to run a single application, not a full Linux system. This is the intended use for Kubernetes.
- system container
- A container meant to run an operating system. This is not the intended use for Kubernetes.
- LXC / LXD containers
- A confusingly named project offering ‘linux containers’, which is one flavor of system containers. These are generally irrelevant when talking about Kubernetes.
- kernel namespace
- A kernel division between resources. For example there are network namespaces which segment network links, user namespaces which segment user mappings, and mount namespaces which segment mount points.
- cgroups
- The Linux kernel interface for ‘container groups’ which form the main kernel control interface for kernel namespace isolation features.
- control plane
- The conceptual segmentation that includes services for configuring a data plane.
- data plane
- The conceptual segmentation that includes the “main” parts of your system, such as public facing web servers.
Kubernetes terms
- node
- A machine that is participating in the Kubernetes cluster. May be virtual, especially if you’re using a cloud service.
- master node
- A machine that is running part of the Kubernetes control plane.
- pod
- A collection of application containers that are managed together and share kernel namespaces. Pods are ephemeral and will not be restarted when they exit.
kubelet
- The main worker process on nodes that manages interaction with the kernel to create pods and the rest of Kubernetes.
- scheduling
- The process of selecting a node to run a pod.
- taint
- A node level attribute that repels pods from scheduling on a node.
- master taint
- A Kubernetes taint that prevents non-control-plane pods from scheduling on a node. If you only have one node, you’ll need to remove this before your pods will schedule.
- Deployment
- A
Deployment
handles autoscaling of stateless applications by monitoring pods associated with this deployment and spawning more pods as needed. - StatefulSet
- A
StatefulSet
handles stateful applications by ensuring that each replica has a unique pod identifier. - DaemonSet
- A
DaemonSet
ensures there is exactly one instance of a pod running on each node. This is useful for system daemons that provide cluster wide services such as DNS or overlay networks.
Networking
The biggest opinion of Kubernetes is that there should be a flat networking namespace shared between nodes, pods, and services. What this means is that these three objects all have L3 IP connectivity. Each node has an IP address, each pod has an IP address, and each service has an IP address. As we are highly likely going to be dynamically creating and destroying pods it follows that Kubernetes would be in charge of IP address assignment.
But before we dig into the nitty gritty here, let’s look at Kubernetes’s parent organization, the Cloud Native Computing Foundation. In contrast to the Docker networking model Kubernetes is designed to work in many network deployment scenarios. As such the CNCF introduced the Container Networking Interface as an interface for attaching IP addresses to pods and services. It’s actually the CNI provider that is responsible for providing pod networking, not Kubernetes directly. This allows for arbitrarily complex networking solutions to exist.
There are many, many options for CNI, far too many to list. In my case, since
I’m hosting a cluster at home, I’d really like the cluster to be part of my
regular L2 LAN. Therefore I use the bridge
CNI plugin, included by default
with Kubernetes, to transparently bridge my cluster’s traffic with my existing
LAN. I’ll talk more about this in a future post.
This is another great strength of Kubernetes: it doesn’t care how you set up your network because at the end of the day you’re just running Linux processes connected to a standard network interface provisioned by CNI.
There are a few important network ranges and nouns to be aware of in the land of Kubernetes:
- Pod Network CIDR
- An IP address range in CIDR notation that pod IP addresses will be allocated from. For example,
192.168.220.0/24
. These belong to pods and are ephemeral. - Service CIDR
- An IP address range in CIDR notation that service IP addresses will be allocated from. For example,
10.96.0.0/12
. These belong to services and are unique per cluster unless manually reused. - ClusterIP
- An IP address within the Service CIDR. For example,
10.96.0.1
.
If you’re setting up networking on your own you’ll need to ensure these addresses are routable. By default kubelet
will install iptables
rules to transparently rewrite services in the service CIDR to masquerade as the hosting node.
Storage
By default all storage in Kubernetes is ephemeral - as part of removing a
running pod
the pod’s storage is completely deleted. This isn’t great
for applications that need to access persistent state.
Kubernetes is similar to docker in that it has a provision for making volumes available to pods. While the purpose is the same, the scope of Kubernetes volume management is much different.
As a cluster operating system Kubernetes is designed to schedule pod
s wherever
free compute is available. Naturally this implies that using local storage is
not at all the default.
Volumes in Kubernetes are relatively complicated and will be covered in a future post. If you’d like to run ahead a bit, here are some pointers:
volumeMounts
- A container specifies where within the container a particular volume is mounted via the
volumeMounts
container attribute. volumes
- A pod specifies the names and types of volumes that are eventually mounted via the
volumes
pod attribute. Volume
- A Kubernetes
Volume
is ephemeral. It essentially represents a mount point within a container. The semantics of the data stored here depends on the particular volume type.
emptyDir
volume- A type of volume that is empty at container start time. When a pod is deleted the contents are removed.
hostPath
volume- A type of volume that more or less bind mounts a path from the hosting node into a pod’s container. Because pods can be scheduled on any node this isn’t a great choice. That said you can limit a pod to be scheduled on a particular node via the
nodeAffinity
pod attribute, which may suffice. PersistentVolume
- The Kubernetes way of guaranteeing that data will not be lost between pod restarts. Just as with
Volume
s there are many types ofPersistentVolume
. PersistentVolumeClaim
- When a pod binds to a
PersistentVolume
it does so by using a PVC object. The Kubernetes controllers match PVCs with PVs to provide storage. local
volume- This is effectively a
hostPath
volume but with additional support for automatic node affinity binding for pods that use it viaPersistentVolume
The lifecycle of PVs and PVCs are complicated and depend on several factors, but the key thing to remember is that PVs are created by cluster administrators either dynamically or statically, while PVCs are created by application / pod authors who which to use persistent storage. This seperation of concerns also makes it easy to switch out storage layers in the future. See Kubernetes’s documentation for advice on PVs and PVCs
In my home cluster I write my applications to use PersistentVolumeClaim
s with
a particular naming pattern and run a script to statically create local
PersistentVolume
objects that will only bind to particularly named PVCs
via the PV claimRef
field. This is a supported case but isn’t explicitly
called out in the docs as this is usually used for dynamic provisioning of PVs
rather than for static pre-provisioning of PVs.
Next time…
Next time I’ll share some details about setting up a master node. Until then, happy hacking.