Homelab Baremetal Kubernetes: Base Cluster Setup

Although Kubernetes is principally designed for a production environment a hacker’s homelab has different requirements. Scalability and high availability, while important and nice goals, ultimately take a back seat in a home environment to more practical concerns such as financial resources and maintenance complexity.

You may want to read the previous entry to learn what Kubernetes is.

As with most technical designs there are always various tradeoffs to consider:

Money vs Time
- Which is more valuable to you?
- What are your resource limits and preferences?
Business vs Personal
- Are you making money from this?
- What kind of SLA is needed?
- What happens if this system is down for a minute? An hour? A day? A week?
Consistency vs Availability
- Is it better to get an outdated answer now or the right answer eventually?
- Is it better to have all your files temporarily inaccessible or always have some files available and risk losing others?
Deployment vs Development
- Do you want to get components that already work together or are you comfortable writing new components?
Toil vs Automation
- Are you okay with spending some time on administrative tasks, or would you prefer to spend a lot of resources (time or money) to automate them?

As this isn’t an exhaustive article intending to answer all those, here are my specific goals with a Kubernetes homelab:

Money is more valuable than time to me because this is about learning and building.

Though this is also a hobby so I’ll spend money if I think it is the best value.

I am not making money from this and am not giving my users SLAs.

If the system is down for a week, eventually I’ll get annoyed and poke it until it’s not down anymore.

Consistency is more important to me than availability.

As I said, I’m the only stakeholder so if the system is down it’s probably my fault :)

I’d prefer to use mostly standard components

But am willing to write my own to integrate how I’d like it

I’d prefer to have /everything/ be automated, but there is a limit

Some toil once in awhile which adding or removing nodes or disks is fine.

Cluster Planning and Design

Abstract

There are a lot of things to plan out when building a cluster of any kind. In general I’ve found it best to start with some initial questions about your homelab.

How will you and your network interact with this cluster? Is Layer 3 IP routing ok?

Kubernetes by default assumes a flat L3 IP network. This works very well in production environments but may limit the kinds of applications you can run if those application assume they can send L2 broadcasts.
I want to host services that advertise their presence using L2 broadcasts, such as DHCP / PXE servers.
I want to host services that interact with devices outside the cluster, such as home automation utilities.
I want to host services that let me automatically provision new machines in a repeatable way, possibilty outside of the cluster.

How does the cluster treat storage? Is everything ephemeral? Is it all stored on a single NAS host? Is the NAS part of the Kubernetes cluster? Should it be?

I want to mostly ignore the concept of storage and isolation as much as I can
There are four storage classes I’d like to treat differently:
- Application Configuration Files
  - eg. Configuration for Home Assistant
- Bulk Application Data
  - eg. Plex or Kodi library media metadata
- Shared Storage
  - eg. media library, general NAS
- Bulk Application Transient Data
  - eg. a cache of Ubuntu packages for fast installs

What nodes will be part of this cluster? How will they be installed?

Bare Metal Provisioning is outside the scope of Kubernetes. I’d like to start from a bare metal Ubuntu installation.
I am only going to use one node for right now.

Concrete

There are many ways to install a Kubernetes cluster such as minikube, kubeadm, kubespray, etc. Because I like to make my configurations as similar to the standard configurations as possible (to reduce friction when sharing Kubernetes configurations) I choose to deploy using kubeadm. However, before we can use it, we’ll need to set up the basic requirements - namely networking and (local) storage.

Local Storage & Networking

Local storage is the easy case: for my master node I’m going to just use the defaults and install to the system disk. This is because I’m planning to reproducibly create and destroy this cluster as much as I want as all state I care about will be persisted in the git repository that holds this configuration.

Networking is a bit harder. Instead of creating a separate network for my cluster I’m going to define a subnetwork of my main LAN network and allow full reachability between my pods (which will have their own IPs) and my non-cluster nodes (such as my NAS, desktop PC, etc). This is kind of strange but really handy and lets me easily interact with a pod remotely just as if it were a normal machine.

My home network uses the IP Range 192.168.0.0/16. My main gateway router is at 192.168.1.1 and runs a DNS and DHCP server. The DHCP server is configured to hand out the range 192.168.1.100 - 192.168.1.200 with a /16 subnet mask.

For Kubernetes we need two IP address ranges - one for our pods and one for our services. A pod IP is a “real IP” on a network interface. The kernel’s IP stack will handle ARP, ICMP replies, and the like. A service IP is generally not a real IP - it’s a masqueraded IP address served by IP tables rules. As such a service IP will not respond to ARP nor ICMP.

In my case I’d like the following configuration:

--service-cidr=10.96.0.0/12: A set of IP addresses that are unused and need L3 routing
--pod-network-cidr=192.168.220.0/24: A set of IP addresses that are unused and can send L2 ethernet frames on the network
--service-dns-domain=your.full.domain: The DNS domain name of the services

Because I’d like to be able to reach these from any machine on my network, including outside the cluster, I also added two static routes on my default gateway to route traffic to these ranges to my master kubernetes node. This allows any machine on the network to direct IP traffic to the router (using the router’s L2 ethernet address) and have the router correctly route it to the kubernetes node (using the node’s L2 address). If you’re setting up multiple nodes you’ll need to ensure each node only assigns a subset of the pod network cidr. In effect this causes my entire network, including the cluster, to appear “within” the cluster for the purposes of L2 traffic.

In my case I’d like to bridge my pods with the the LAN outside the cluster. To do that I have already created a bridge interface named virt0 which holds the node’s main IP address (in my case 192.168.1.50).

1
2
3
4
5


$ ip addr
5: virt0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:c0:aa:11:40 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.50/16 brd 192.168.255.255 scope global virt0
       valid_lft forever preferred_lft forever

The previously mentioned kubeadm command line arguments will be used during cluster creation, but how does the node know to allocate the IP addresses for pods during operation?

This brings us to the Container Network Interface specification. CNI expects files representing networks to exist at /etc/cni/net.d/. In my case I use the below configuration named 20-cni-bridge.conf to instruct the kubelet to call the bridge CNI plugin. The bridge CNI plugin attaches to the interface named virt0 which is the main internet connection for the node. You can see the CNI configuration also sets a few routes below. The DNS setting is irrelevant here due to the default Kubernetes DNS policy of ClusterFirst.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


{
  "cniVersion": "0.3.1",
  "name": "k8s_bridge",
  "type": "bridge",
  "bridge": "virt0",
  "isGateway": true,
  "isDefaultGateway": false,
  "ipMasq": false,
  "hairpinMode": true,
  "promiscMode": false,
  "ipam": {
    "type": "host-local",
    "subnet": "192.168.0.0/16",
    "rangeStart": "192.168.220.2",
    "rangeEnd": "192.168.220.254",
    "gateway": "192.168.1.50",
    "routes": [
      { "dst": "0.0.0.0/0", "gw": "192.168.1.1" },
      { "dst": "10.96.0.0/12", "gw": "192.168.1.50" },
      { "dst": "192.168.0.0/16" }
    ]
  },
  "dns": {
    "nameservers": [
      "192.168.1.1"
    ]
  }
}

Now that we have the proper information it’s time to set up the initial master and the cluster.

Master Node Setup

The initial install of kubernetes onto nodes requires at least kubelet and kubeadm. I also reccomend installing kubectl to interact with the cluster.

Additionally stern is a useful log reading tool that I like to use.

1
2
3
4
5
6


#!/bin/sh
apt install kubelet kubeadm kubectl
# Install stern
wget https://github.com/wercker/stern/releases/download/1.8.0/stern_linux_amd64
mv stern* /usr/local/bin/stern
chmod +x /usr/local/bin/stern

Cluster Setup

kubelet does not officially support running on machines with swap enabled. Your choices are to disable swap with swapoff -a and persisting that or modifying /etc/default/kubelet with KUBELET_EXTRA_ARGS.

I assign --cluster-domain to instruct kubelet to assign domain search paths to /etc/resolv.conf for ease of service resolution.

I save the admin kubeconfig as my current kubectl credentials.

I copy the CNI configuration as a previous kubeadm reset may have removed it.

Since I only have the one node I remove the master taint to allow user pods to schedule.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


#!/bin/sh
# Swap is okay; set the cluster domain
echo "KUBELET_EXTRA_ARGS=--fail-swap-on=false --cluster-domain=your.local.cluster" > /etc/default/kubelet
# Service CIDR is the VIP range for kubernetes services
# Pod Network CIDR is the IP range that the CNI plugin will asssign to pods
kubeadm init \
  --kubernetes-version=1.11.1 \
  --ignore-preflight-errors=Swap \
  --service-cidr=10.96.0.0/12 \
  --pod-network-cidr=192.168.220.0/24 \
  --service-dns-domain=your.local.cluster
if [ $? -eq 0 ]; then
  mkdir -p ~/.kube
  cp /etc/kubernetes/admin.conf ~/.kube/config
  cp ./cni-bridge.conf /etc/cni/net.d/20-cni-bridge.conf

  # Remove master taint
  kubectl taint nodes --all node-role.kubernetes.io/master-
fi

Cluster Destruction

kubeadm reset will undo changes made by kubeadm if you’d like to retry creation.

Wrapping up

At this point you should have a functioning Kubernetes cluster that can run pods on a single node. The node likely already has the system pods running; kubectl get pods --all-namespaces and kubectl describe pods can be used to get information.

1
2
3
4
5
6
7
8
9


$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY   STATUS    RESTARTS   AGE
kube-system   coredns-78fcdf6894-f5z8d               1/1     Running   14         53d
kube-system   coredns-78fcdf6894-skq9q               1/1     Running   14         53d
kube-system   etcd-ebsrv                             1/1     Running   7          27d
kube-system   kube-apiserver-ebsrv                   1/1     Running   23         27d
kube-system   kube-controller-manager-ebsrv          1/1     Running   26         27d
kube-system   kube-proxy-mtxz7                       1/1     Running   7          90d
kube-system   kube-scheduler-ebsrv                   1/1     Running   24         27d

Contents