By having a Highly Available Kubernetes Pi Cluster, you will have full control over your production grade environment on-premise
HA Kubernetes Pi Cluster I
(Total Setup Time: 25 mins)
On this special day, I will like to wish all Singaporeans and Singapore a Happy 55th National Day!
With the newly purchase 2x Raspberry Pi Model B 8GB and 64GB SD card to my collection, I will setup a Highly Available Kubernetes Pi Cluster. In Part 1, I will setup an external etcd key-value store. For Part 2, I will configure the HA Kubernetes Cluster using the external etcd.
Preparing OS
(10 mins)
First, I am using Ubuntu Server (64-bit) as my OS. After burning the image onto my 64GB SD card, create an empty file ssh d:/boot. This section is required for each master nodes.
Second, change the default password ubuntu for the default ubuntu user. Upgrade the OS:
sudo apt update sudo apt upgrade
Third, change the hostname by running:
sudo vi /etc/hostname sudo vi /etc/hosts
Fourth, letting iptables to see bridged traffic:
# Checks if br_netfilter is loaded lsmod | grep br_netfilter # Loads its explicitly sudo modprobe br_netfilter # Sees bridged traffic cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 EOF sudo sysctl --system
Fifth, enable memory cgroup, by adding the following to /boot/firmware/cmdline.txt:
cgroup=cpuset cgroup_enable=memory cgroup_memory=1 swapaccount=1
Finally, add the following to /boot/firmware/usercfg.txt for disabling WiFi and Bluetooth:
dtoverlay=disable-wifi dtoverlay=disable-bt # Memory group should be 1 after reboot sudo reboot grep mem /proc/cgroups | awk '{ print $4 }'
Creating Virtual IP
(5 mins)
First, install keepalived referencing from LVS-NAT-Keepalived for all master nodes.
#Installs keepalived sudo apt install keepalived # Configures keepalived sudo vi /etc/keepalived/keepalived.conf #VRRP Instances definitions #state MASTER for first master, BACKUP for other master nodes vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 150 authentication { auth_type PASS auth_pass DOJOCUBE } virtual_ipaddress { 192.168.100.200 } } # Virtual Servers definitions virtual_server 192.168.100.200 6443 { delay_loop 6 lb_algo rr lb_kind NAT protocol TCP real_server 192.168.100.119 6443 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 6443 } } real_server 192.168.100.173 6443 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 6443 } } real_server 192.168.100.100 6443 { weight 1 TCP_CHECK { connect_timeout 3 connect_port 6443 } } } # Restarts keepalived sudo systemctl restart keepalived
Second, test the connection, which will fail at this point in time
nc -v 192.168.100.200 6443 # Expected result nc: connect to 192.168.100.200 port 6443 (tcp) failed: Connection refused
Preparing certs for etcd
(5 mins)
First, by following openssl CA, configure openssl and create root cert:
sudo su # Openssl configuration vi /usr/lib/ssl/openssl.cnf [ CA_default ] dir = /root/ca mkdir /root/ca cd /root/ca mkdir newcerts certs crl private requests touch index.txt echo '1234' > serial # Root certificate openssl genrsa -aes256 -out private/cakey.pem 4096 openssl req -new -x509 -key private/cakey.pem -out cacert.pem -days 3650 -set_serial 0 -subj '/C=SG/ST=SG/O=Dojocube/CN=master1'
Second, create certs for all master nodes:
# Create master nodes' certificate cd /root/ca/requests/ openssl genrsa -out etcd-key.pem openssl req -new -key etcd-key.pem -out etcd.csr -subj '/C=SG/ST=SG/O=Dojocube/CN=master1,master2,master3,localhost,cluster-endpoint' openssl ca -in etcd.csr -out etcd.pem \ -extfile <(printf "subjectAltName=IP:192.168.100.119,IP:192.168.100.173,IP:192.168.100.100,IP:192.168.100.200,IP:127.0.0.1,\ DNS:master1,DNS:master2,DNS:master3,DNS:localhost,DNS:cluster-endpoint") openssl genrsa -out peer-etcd-key.pem openssl req -new -key peer-etcd-key.pem -out peer-etcd.csr -subj '/C=SG/ST=SG/O=Dojocube/CN=192.168.100.119,192.168.100.173,192.168.100.100,192.168.100.200' openssl ca -in peer-etcd.csr -out peer-etcd.pem \ -extfile <(printf "subjectAltName=IP:192.168.100.119,IP:192.168.100.173,IP:192.168.100.100,IP:192.168.100.200,IP:127.0.0.1,\ DNS:master1,DNS:master2,DNS:master3,DNS:localhost,DNS:cluster-endpoint") rm etcd.csr peer-etcd.csr mv etcd-key.pem peer-etcd-key.pem /root/ca/private/ mv etcd.pem peer-etcd.pem /root/ca/certs/ # Protects /root/ca folder chmod -R 600 /root/ca
Third, copies all certs to /srv/etcd-certs/ for all master nodes.
# Copies certs to master1 cp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \ /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem /srv/etcd-certs/ # Copies certs to other master nodes scp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \ /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem ubuntu@master2:/tmp scp /root/ca/cacert.pem /root/ca/private/etcd-key.pem /root/ca/private/peer-etcd-key.pem \ /root/ca/certs/etcd.pem /root/ca/certs/peer-etcd.pem ubuntu@master3:/tmp # ssh into other master nodes, perform these sudo mv /tmp/*.pem /srv/etcd-certs/ sudo chown -R etcd:etcd /srv/etcd-certs/
Lastly, update ca-certificate on all nodes
sudo cp /srv/etcd-certs/cacert.pem /usr/local/share/ca-certificates sudo update-ca-certificates --fresh
Setting up etcd
(5 mins)
First, by following v3.4.10 release, setup etcd on each master as follows:
ETCD_VER=v3.4.10 GITHUB_URL=https://github.com/etcd-io/etcd/releases/download DOWNLOAD_URL=${GITHUB_URL} # Downloads the arm64 architecture for Raspberry Pi rm -f /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-arm64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz # Extracts etcd tar xzvf /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz -C /tmp/etcd-download-test --strip-components=1 rm -f /tmp/etcd-${ETCD_VER}-linux-arm64.tar.gz # Checks version /tmp/etcd-download-test/etcd --version (Error: etcd on unsupported platform without ETCD_UNSUPPORTED_ARCH=arm64 set) /tmp/etcd-download-test/etcdctl version (Success: etcdctl version: 3.4.10, API version: 3.4) # Moves to /usr/local/bin sudo cp /tmp/etcd-download-test/etcd /usr/local/bin/ sudo cp /tmp/etcd-download-test/etcdctl /usr/local/bin/ export ETCD_UNSUPPORTED_ARCH=arm64 # Checks version again etcd --version (Sccuess: running etcd on unsupported architecture "arm64" since ETCD_UNSUPPORTED_ARCH is set)
Second, prepares etcd as a service on master1.
sudo vi /lib/systemd/system/etcd.service # Inserts the following into etcd.service for master1 [Unit] Description=etcd key-value store Documentation=https://etcd.io/docs/v3.4.0/ [Service] User=etcd Type=notify Environment=ETCD_UNSUPPORTED_ARCH=arm64 #Loggingg flags Environment=ETCD_LOGGER=zap # Member flags Environment=ETCD_NAME=infra1 Environment=ETCD_DATA_DIR=/var/lib/etcd Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.119:2380 Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.119:2379,https://127.0.0.1:2379 Environment=ETCD_HEARTBEAT_INTERVAL=1000 Environment=ETCD_ELECTION_TIMEOUT=5000 # Clustering flags Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.119:2380 Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380 Environment=ETCD_INITIAL_CLUSTER_STATE=new Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.119:2379 # Security flags Environment=ETCD_CLIENT_CERT_AUTH=true Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem Environment=ETCD_PEER_CLIENT_CERT_AUTH=true Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem ExecStart=/usr/local/bin/etcd Restart=always RestartSec=10s LimitNOFILE=40000 [Install] WantedBy=multi-user.target
Third, creates etcd data folder and system account on master1:
sudo mkdir -p /var/lib/etcd # etcd fails if file permissions are not set correctly sudo chmod -R 700 /var/lib/etcd # Creates system user sudo adduser --system etcd sudo addgroup etcd sudo usermod -aG etcd etcd
Fourth, install etcd as a service:
sudo systemctl daemon-reload sudo systemctl enable etcd sudo systemctl stop etcd sudo systemctl start etcd.service systemctl status etcd.service # Check for logs journalctl -xeu etcd
Fifth, ssh into other master nodes (verify that step 1 is done by etcd --version) and perform the following:
ssh ubuntu@master2 sudo vi /lib/systemd/system/etcd.service # Variations for step 2 for master2 # Inserts the following into etcd.service [Unit] Description=etcd key-value store Documentation=https://etcd.io/docs/v3.4.0/ [Service] User=etcd Type=notify Environment=ETCD_UNSUPPORTED_ARCH=arm64 #Loggingg flags Environment=ETCD_LOGGER=zap # Member flags Environment=ETCD_NAME=infra2 Environment=ETCD_DATA_DIR=/var/lib/etcd Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.173:2380 Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.173:2379,https://127.0.0.1:2379 Environment=ETCD_HEARTBEAT_INTERVAL=1000 Environment=ETCD_ELECTION_TIMEOUT=5000 # Clustering flags Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.173:2380 Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380 Environment=ETCD_INITIAL_CLUSTER_STATE=new Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.173:2379 # Security flags Environment=ETCD_CLIENT_CERT_AUTH=true Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem Environment=ETCD_PEER_CLIENT_CERT_AUTH=true Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem ExecStart=/usr/local/bin/etcd Restart=always RestartSec=10s LimitNOFILE=40000 [Install] WantedBy=multi-user.target EOF ssh ubuntu@master3 sudo vi /lib/systemd/system/etcd.service # Variations for step 2 for master3 # Inserts the following into etcd.service [Unit] Description=etcd key-value store Documentation=https://etcd.io/docs/v3.4.0/ [Service] User=etcd Type=notify Environment=ETCD_UNSUPPORTED_ARCH=arm64 #Loggingg flags Environment=ETCD_LOGGER=zap # Member flags Environment=ETCD_NAME=infra3 Environment=ETCD_DATA_DIR=/var/lib/etcd Environment=ETCD_LISTEN_PEER_URLS=https://192.168.100.100:2380 Environment=ETCD_LISTEN_CLIENT_URLS=https://192.168.100.100:2379,https://127.0.0.1:2379 Environment=ETCD_HEARTBEAT_INTERVAL=1000 Environment=ETCD_ELECTION_TIMEOUT=5000 # Clustering flags Environment=ETCD_INITIAL_ADVERTISE_PEER_URLS=https://192.168.100.100:2380 Environment=ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster-1 Environment=ETCD_INITIAL_CLUSTER=infra1=https://192.168.100.119:2380,infra2=https://192.168.100.173:2380,infra3=https://192.168.100.100:2380 Environment=ETCD_INITIAL_CLUSTER_STATE=new Environment=ETCD_ADVERTISE_CLIENT_URLS=https://192.168.100.100:2379 # Security flags Environment=ETCD_CLIENT_CERT_AUTH=true Environment=ETCD_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_CERT_FILE=/srv/etcd-certs/etcd.pem Environment=ETCD_KEY_FILE=/srv/etcd-certs/etcd-key.pem Environment=ETCD_PEER_CLIENT_CERT_AUTH=true Environment=ETCD_PEER_TRUSTED_CA_FILE=/srv/etcd-certs/cacert.pem Environment=ETCD_PEER_CERT_FILE=/srv/etcd-certs/peer-etcd.pem Environment=ETCD_PEER_KEY_FILE=/srv/etcd-certs/peer-etcd-key.pem ExecStart=/usr/local/bin/etcd Restart=always RestartSec=10s LimitNOFILE=40000 [Install] WantedBy=multi-user.target # Follows step 3 sudo mkdir -p /var/lib/etcd sudo chmod -R 700 /var/lib/etcd sudo adduser --system etcd sudo addgroup etcd sudo usermod -aG etcd etcd sudo mkdir -p /srv/etcd-certs/ sudo mv ~/*.csr ~/*.pem /srv/etcd-certs/ sudo chown -R etcd:etcd /srv/etcd-certs/ /var/lib/etcd /usr/local/bin/etcd /usr/local/bin/etcdctl # Follows step 4 sudo systemctl daemon-reload sudo systemctl enable etcd sudo systemctl start etcd sudo systemctl stop etcd systemctl status etcd
Finally, by following etcd security guide, I test my etcd setup using:
sudo curl -k -L https://localhost:2379/metrics --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem --key /srv/etcd-certs/etcd-key.pem | grep -v debugging sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem --key /srv/etcd-certs/etcd-key.pem member list
Troubleshooting
Request sent was ignored by remote peer due to cluster ID mismatch
I solved mine by changing ETCD_INITIAL_CLUSTER_TOKEN=[something else]. You can check the end point health.
sudo etcdctl --endpoints=https://cluster-endpoint:2379 --cacert=/srv/etcd-certs/cacert.pem --cert=/srv/etcd-certs/etcd.pem --key=/srv/etcd-certs/etcd-key.pem endpoint health
If it still fails, you may try to re-create the folder:
sudo rm -rf /var/lib/etcd && sudo mkdir -p /var/lib/etcd sudo chown -R etcd:etcd /var/lib/etcd && sudo chmod -R 700 /var/lib/etcd
ERROR:There is already a certificate for /C=SG/ST=SG/O=Dojocube/CN=192.168.100.119
You may try to revoke the old certificate and sign the CSR again:
openssl ca -revoke /root/ca/newcerts/1240.pem
Replacing a faulty member
You may have to re-configure your etcd cluster when your SD card failed
# Get member list sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem --key /srv/etcd-certs/etcd-key.pem member list # Delete the member based on the ID sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem --key /srv/etcd-certs/etcd-key.pem member remove 34ef554257cff34e # Add the previous member (previous settings remain the same, e.g. IP address) sudo etcdctl --cacert /srv/etcd-certs/cacert.pem --cert /srv/etcd-certs/etcd.pem --key /srv/etcd-certs/etcd-key.pem member add infra3 --peer-urls=https://192.168.100.182:2380
Similar message as this will appear:
sudo vi /lib/systemd/system/etcd.service # Make the following changes Environment=ETCD_INITIAL_CLUSTER_STATE=existing # Add new configuration Environment=ETCD_INITIAL_CLUSTER="infra2=https://192.168.100.181:2380,infra3=https://192.168.100.182:2380,infra1=https://192.168.100.180:2380" # Restart etcd sudo systemctl daemon-reload sudo systemctl start etcd