How I automated Kubernetes cluster upgrades using Ansible, turning a tedious multi-hour process into a single command.

The Problem

My homelab Kubernetes cluster was running v1.28, several versions behind the current stable release. Kubernetes only supports upgrading one minor version at a time, meaning I’d need to go through:

1.28 → 1.29 → 1.30 → 1.31 → 1.32

Each upgrade involves:

  • Upgrading the control plane (kubeadm, then kubelet/kubectl)
  • Draining each worker node
  • Upgrading packages on each worker
  • Uncordoning and waiting for Ready state
  • Verifying cluster health

Doing this manually across 4 nodes, 4 times, is tedious and error-prone. Enter Ansible.

Cluster Setup

NodeRoleOS
k8s-master01Control PlaneCentOS Stream 9
k8s-worker01WorkerCentOS Stream 9
k8s-worker02WorkerCentOS Stream 9
polycephalaWorkerRocky Linux 10

All nodes use kubeadm for cluster management and containerd as the container runtime.

The Playbooks

I created two playbooks:

  1. k8s-upgrade-check.yml - Pre-flight checks
  2. k8s-upgrade.yml - Performs the actual upgrade

Directory Structure

ansible/
├── inventory/
│   └── hosts.yml
├── playbooks/
│   ├── k8s-upgrade-check.yml
│   └── k8s-upgrade.yml
└── README.md

Inventory

Simple YAML inventory using hostnames (assuming DNS/hosts resolution works):

all:
  children:
    control_plane:
      hosts:
        k8s-master01:
    workers:
      hosts:
        k8s-worker01:
        k8s-worker02:
        polycephala:
    k8s_cluster:
      children:
        control_plane:
        workers:
  vars:
    ansible_user: will
    ansible_become: true

Pre-flight Check Playbook

Before upgrading, I want to verify:

  • All nodes are Ready
  • The target version packages are available
  • etcd is healthy
  • No pending pods that would block draining
- name: Check available kubeadm versions (RHEL-based)
  shell: |
    dnf list available kubeadm --showduplicates 2>/dev/null | grep -E "{{ next_minor_version }}" | tail -5
  register: available_kubeadm

- name: Check all nodes are Ready
  command: kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}'
  register: node_status

- name: Check etcd cluster health
  shell: |
    ETCDCTL_API=3 etcdctl \
      --endpoints=https://127.0.0.1:2379 \
      --cacert=/etc/kubernetes/pki/etcd/ca.crt \
      --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
      --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
      endpoint health

Usage:

ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade-check.yml

Output shows version availability, node status, and the kubeadm upgrade plan.

Upgrade Playbook

The main upgrade playbook handles the entire process:

1. Backup etcd

Never upgrade without a backup:

- name: Backup etcd
  shell: |
    ETCDCTL_API=3 etcdctl snapshot save {{ backup_file }} \
      --endpoints=https://127.0.0.1:2379 \
      --cacert=/etc/kubernetes/pki/etcd/ca.crt \
      --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
      --key=/etc/kubernetes/pki/etcd/healthcheck-client.key

2. Upgrade Control Plane

- name: Upgrade kubeadm
  dnf:
    name: "kubeadm-{{ full_target_version }}"
    state: present
    disable_excludes: kubernetes

- name: Apply kubeadm upgrade
  command: "kubeadm upgrade apply v{{ full_target_version }} --yes"

- name: Upgrade kubelet and kubectl
  dnf:
    name:
      - "kubelet-{{ full_target_version }}"
      - "kubectl-{{ full_target_version }}"
    state: present
    disable_excludes: kubernetes

- name: Restart kubelet
  systemd:
    name: kubelet
    state: restarted
    daemon_reload: true

3. Upgrade Workers (Serial)

Workers are upgraded one at a time (serial: 1) to maintain availability:

- name: Drain node
  command: "kubectl drain {{ inventory_hostname }} --ignore-daemonsets --delete-emptydir-data --force"
  delegate_to: "{{ groups['control_plane'][0] }}"

- name: Upgrade kubeadm
  dnf:
    name: "kubeadm-{{ full_target_version }}"
    state: present

- name: Upgrade node config
  command: kubeadm upgrade node

- name: Upgrade kubelet and kubectl
  dnf:
    name:
      - "kubelet-{{ full_target_version }}"
      - "kubectl-{{ full_target_version }}"
    state: present

- name: Restart kubelet
  systemd:
    name: kubelet
    state: restarted
    daemon_reload: true

- name: Uncordon node
  command: "kubectl uncordon {{ inventory_hostname }}"
  delegate_to: "{{ groups['control_plane'][0] }}"

4. Verification

After all nodes are upgraded:

- name: Get cluster version info
  command: kubectl get nodes -o wide

- name: Check all system pods
  shell: |
    kubectl get pods -n kube-system -o wide | grep -v Running | grep -v Completed

Safety Features

The playbook includes several safety mechanisms:

Version Jump Validation

Prevents accidentally skipping minor versions:

- name: Validate version jump
  fail:
    msg: "Cannot skip minor versions. Current: {{ current_minor }}, Target: {{ upgrade_minor }}"
  when: (upgrade_minor.split('.')[1] | int) - (current_minor.split('.')[1] | int) > 1

Confirmation Prompt

Requires explicit confirmation before proceeding:

- name: Confirm upgrade
  pause:
    prompt: "Press ENTER to continue with upgrade to {{ full_target_version }}, or Ctrl+C to abort"

Wait for Ready State

Doesn’t proceed until each node is Ready:

- name: Wait for node to be Ready
  command: kubectl get node {{ inventory_hostname }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
  register: node_ready
  until: node_ready.stdout == "True"
  retries: 30
  delay: 10

Usage

Check Readiness

ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade-check.yml

Perform Upgrade

# Auto-detect next version
ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade.yml

# Specific version
ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade.yml -e target_version=1.29.15

Full Upgrade Path

To go from 1.28 to 1.32:

for version in 1.29 1.30 1.31 1.32; do
  ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade-check.yml -e target_version=$version
  ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade.yml -e target_version=$version
done

Lessons Learned

1. Serial Execution is Essential

Initially I tried upgrading workers in parallel. Bad idea - if something goes wrong, you’ve potentially broken multiple nodes. serial: 1 ensures we upgrade one node at a time.

2. Always Backup etcd

Even with Ansible automation, things can go wrong. The etcd backup saved me once when a network issue mid-upgrade left a node in a weird state.

3. Check Package Availability First

The pre-flight check playbook isn’t optional. I’ve been burned by assuming packages were available when the Kubernetes repo wasn’t properly configured on a node.

4. DNS Resolution Matters

Using hostnames instead of IP addresses in the inventory makes the playbooks cleaner and works better with kubectl drain/uncordon commands.

5. disable_excludes is Required

On RHEL-based systems, the kubernetes repo uses exclude=kubelet kubeadm kubectl to prevent accidental upgrades. The disable_excludes: kubernetes option is required to actually install the new versions.

What’s Not Covered

This playbook assumes:

  • Single control plane node (no HA control plane)
  • RHEL-based systems (dnf package manager)
  • kubeadm-based cluster
  • containerd runtime (no Docker)

For HA control planes, you’d need to upgrade control plane nodes serially as well.

Conclusion

What used to be a multi-hour, error-prone manual process is now:

ansible-playbook -i inventory/hosts.yml playbooks/k8s-upgrade.yml

The playbook handles the entire upgrade safely:

  • Backs up etcd first
  • Validates the version jump
  • Upgrades control plane
  • Drains, upgrades, and uncordons each worker
  • Verifies cluster health

For a homelab with 4 nodes, this saves about 2-3 hours per version upgrade - and more importantly, eliminates the risk of forgetting a step or making a typo.