Working on IT problems often requires intense focus and research to find the solution to the problem. I’ve previously written about Rabbit holes and Time sinks, this axiom is an extension of those. Sometimes you just have to know when to quit and regroup, rather than continuing to bang your head against the wall.
I’ve become familiar with Docker over the last year, using it for testing and educating myself on current technologies. My day job is working as a Principal Technical Support Engineer for MySQL, so I encounter every type of deployment you can imagine. We also have new product releases from time to time and I decided to dive into Kubernetes so that I can be knowledgeable in that domain.
I have a home lab that consists of several computers running Fedora 32 on bare metal, these machines have a variety of purposes, some of which include running VMs or containers. I find containers to be a cleaner way of deploying software, running multiple versions side by side, and deploying services that would otherwise be messy on a bare metal environment.
I decided to deploy a k3s Kubernetes cluster using 4 VMs (after realizing that k3s would be messy if deployed on the bare metal OSes), so naturally I turned to Fedora 33 for this task, because 33 is 1 better than 32, right? After installing k3s on the VMs I setup Rancher, then attempted to install Longhorn — that’s when I ran into my insurmountable problem: DNS refused to work on the agent nodes.
I spent some time researching the problem and found a thread with many other people that experienced the same issue, then suddenly 1 worker node had working DNS, but the other 2 did not. I watch tcpdumps, stared at iptables counters, and scrutinized NAT and filter rules for hours, to no avail.
It was then I decided: I don’t need to run Fedora, I don’t need to make this work, I just needed to have a working k3s Kubernetes cluster.
This is a very important lesson that some people in the IT field don’t learn: Know when things are solvable and when they are entropy! A more colloquial version is “fish or cut bait”. You may be asking why I used the word entropy, the definition of entropy is: lack of order or predictability. I’d say that chasing “why does one VM work and not the other” qualifies!
We like to think that computers behave deterministically, however I observed VMs that were otherwise identical behaving in non-deterministic ways. I don’t like to let problems lie, I like to solve them and understand why they happen, but in this case I needed to recognize that this problem wasn’t deterministic, I wasn’t going to find a solution.
The simple solution came in the form of k3os, a Linux distro from Rancher that is the enterprise equivalent of Libreelec for Kodi: Just enough OS for k3s. k3os is a little obtuse up front, but once you embrace the YAML and just write a simple config file for each node, deployment couldn’t be simpler or easier. It really makes deploying Kubernetes nodes turnkey.
Since rebuilding the cluster with k3os it has been trivially easy to setup most things, I just need more Internet bandwidth 😉 Below are sample server and agent node configurations I used, the format can be more expressive, so refer to the k30s site for more details:
ssh_authorized_keys: - ssh-rsa ... email@example.com write_files: - path: /var/lib/connman/default.config content: |- [service_eth0] Type=ethernet IPv4=192.168.4.99/255.255.255.0/192.168.4.1 IPv6=off Nameservers=192.168.4.1 hostname: master-node.example.com k3os: modules: - kvm dns_nameservers: - 192.168.4.1 ntp_servers: - 0.us.pool.ntp.org password: password
ssh_authorized_keys: - ssh-rsa ... firstname.lastname@example.org write_files: - path: /var/lib/connman/default.config content: |- [service_eth0] Type=ethernet IPv4=192.168.4.100/255.255.255.0/192.168.4.1 IPv6=off Nameservers=192.168.4.1 hostname: worker-node.example.com k3os: modules: - kvm dns_nameservers: - 192.168.4.1 ntp_servers: - 0.us.pool.ntp.org password: password server_url: https://192.168.4.99:6443 token: <node_token>