Skip to main content

Initializing a cluster with rke2 on Hetzner

· 12 min read
Hreniuc Cristian-Alexandru

This document describes all steps that we need to make when we decide to start the production cluster from Hetzner. This contains:

  • server installation
  • database
  • frontend apps
  • backend apps
  • ssl
  • grafana + loki

1 Install servers

We buy the servers from the clould web interface. For each server we need to do the following steps when buying:

  • Add it to the brandName-net-01 private network(Used to access the nfs storage) In the future, maybe start the cluster on this network.

  • Add it to the brandName-firewall-01 firewall

  • Add it to the brandName-01 placement group(this way they won't end up on the same phisical server, so if one fails the others are still up)

  • Add the public IP to the brandName-firewall-01 fireawall, we have two rules that allow traffic between those servers. This is due to tha fact that we couldn't make it(rke2 cluster, here's smt similar) work on the private addresses.

1.1 Change root pass

After buying a new server, we will receive an email with the root pass, we will connect manually to it and change the pass.

We also need to add it to the inventory of rke2-ansible.

1.2 Local utilities to install and preparations

We need to add the users to the new servers and and install the requirements.

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository --yes --update ppa:ansible/ansible
sudo apt install ansible

Prepare the key for ansible_noob user which will be used to install all things on nodes.

Generate the key

ssh-keygen -t rsa -b 4096 -C "ansible_noob"

1.3 Add ansible_noob user

Adds the anssible_noob user to all servers and copies the key + makes the user a sudoer.

To run this, you will need sshpass instaled on your PC:

sudo apt-get install sshpass
ansible-playbook -v -i hosts/hetzner/hosts_ansible_noobs ansible_noob.yml

Note: When you want to install a new node, add it to the ansible_noobs group and run the ansible_noob.yml, then comment/remove the hosts from that group.

1.4 Init server - install utilities for rke

  • Update + upgrade
  • Add developer users
  • Nfs server on nfs_servers
ansible-playbook -v -i hosts/hetzner/hosts init_rke2_hetzner.yml

# Or
ansible-playbook -v -i hosts/hetzner/hosts --key-file "~/.ssh/ansible_noob_id_rsa" init_rke2_hetzner.yml

Note: When you want to install a new node, add it to the new_nodes group and run the init.yml, then remove the hosts from that group.

You can test of the nfs works, you can mount it on another server and see if it works:

ssh ansible_noob@SERVER-IP-1

sudo mkdir test
sudo mount 10.112.0.2:/var/nfs/general $(pwd)/test
cd test
touch file
cd ..
sudo umount $(pwd)/test

exit

ssh ansible_noob@SERVER-IP-2

cd /var/nfs/general
ls
# file should be there

1.3 Install RKE2

git clone git@github.com:rancherfederal/rke2-ansible.git

cd rke2-ansible/

ansible-galaxy collection install -r requirements.yml

cd inventory/
ln -s ../../rke2_inventory/hetzner/ hetzner

ansible-playbook site.yml -i inventory/hetzner/hosts.ini

To get the kubeconfig(we can omit this, because we can get it from rancher):

ssh ansible_noob@SERVER-IP-2
sudo cp /etc/rancher/rke2/rke2.yaml .
sudo chown ansible_noob: rke2.yaml
exit

scp ansible_noob@SERVER-IP-2:/home/ansible_noob/rke2.yaml $(pwd)/inventory/hetzner/credentials/

# Edit the server ip
export KUBECONFIG=/path/rke2_inventory/hetzner/credentials/rke2.yaml

kubectl get nodes

1.4 Post RKE2 install

Things that we need to do after RKE2 is installed. This is needed for rancher:

cd .. # get back in the ansible folder
# Make sure that the master node is not commented in the new_nodes section
ansible-playbook -v -i hosts/hetzner/hosts post_rke2.yml

1.5 Install rancher

Source

Install helm on your PC and add the repository + create namesapce for rancher:

# Helm install
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

kubectl create namespace cattle-system

Install cert-manager:

# If you have installed the CRDs manually instead of with the `--set installCRDs=true` option added to your Helm install command, you should upgrade your CRD resources before upgrading the Helm chart:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.6.1

# See the cert manager pods
kubectl get pods --namespace cert-manager

Install rancher with rancher Certificates, the external certificates will be provided by clouldflare:

helm install rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname=rancher-hetzner.brandName.com \
--set replicas=3

# To uninstall
helm uninstall rancher

# Wait for it to finish installing:
kubectl -n cattle-system rollout status deploy/rancher

kubectl -n cattle-system get deploy rancher

Get the link for the first setup

echo https://rancher-hetzner.brandName.com/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}')

Open that in a browser and set the password.

If you forget the password, use this.

2 Post install rancher

2.1 Add helm repositories in rancher

In Apps & Marketplace > Repositories -> Create

  • https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/

The rest can be postponned:

https://charts.helm.sh/stable
https://charts.helm.sh/incubator
https://charts.jetstack.io

2.2 Prepare secrets

From Storage > Secrets > Create > Opaque

For rancher backups: backblaze-brandName-hetzner-rancher, source.

accessKey: KEYID
secretKey: SECRET

For vitess backup: backblaze-brandName-hetzner-vitess, the key should be brandName-hetzner-vitess-key and the value should be this(It looks silly, I know..):

[default]
aws_access_key_id=KEYID
aws_secret_access_key=SECRET

Note: It must be in ~/.aws/credentials format as stated in the docs.

For mailjet: mailjet-api, it should contain two keys:

  • MAILJET_API_KEY - value
  • MAILJET_API_SECRET - value

2.2 Install cluster tools

  • nfs-subdir-external-provisioner provider(this will pop-up when installing the nfs-subdir-external-provisioner, set as default class + set archive to true)
  • Rancher Backups

These, we can install them when we really need them:

  • Monitoring: 10Gb - 10d
  • Alerting Drivers - I'm not sure if we should install this.

NFS install.

Go to Apps & Marketplace > Charts and search for nfs:

Name: nfs-master1-storage

  path: /var/nfs/general
server: 10.112.0.2 # Master1 private IP
allowVolumeExpansion: true
archiveOnDelete: true
defaultClass: true
name: nfs-master1-storage

For Rancher backups use the following: Cluster Tools > Rancher Backups

secret: backblaze-brandName-hetzner-rancher
region: eu-central-003
endpoint: s3.eu-central-003.backblazeb2.com
bucket name: brandName-hetzner-rancher

Then go to the Rancher Backups > Backup > Create section and create a recurring backup, everyday at 12 AM: 0 0 * * *(UTC -> 03:00 RO). The name should be backup-rancher-to-backblaze.

Retention: 30

3 Prepare database

All things that are needed for preparing the database env for our apps.

3.1 Vitess

Install the operator for vitess:

kubectl apply -f https://raw.githubusercontent.com/vitessio/vitess/main/examples/operator/operator.yaml

Install vitess:

Before this, you should add a backup inside the bucket, this way it will initizlize. Or comment the initializeBackup: true from vitess. Check this for more info for the initial import:

1. Initial schema import - This should be only one time, we shouldn't need this anymore

Tpyeorm doesn't work with vitess atm, I've opened an issue here, so to initialize the database I did the following:

  • Created an empty database locally: domain-com-prod-schema
  • Started the backend and connected to it
  • Ran mysqldump -d -u root -p domain-com-prod-schema > domain-com-prod.sql
  • Commented the initializeBackup: true from the vitess cluster, because there is no backup for it.
  • Started the vitess cluster - todo link - It will auto upload a backup to backblaze.
  • Uncomment that line, and apply the vitess again. To be sure, I've deleted the cluster and re-deployed with the line uncommented.
  • Ran the pf.sh script from vitess: bash pf.sh
  • Created alias for mysql: alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin" - You need to use the admin user.
  • Imported the schema: mysql -pdomain-com_admin < domain-com-prod.sql.

2. Update database

  • If you need to add a new table:
  • Run mysqldump -d -u root -p domain-com-prod-schema > domain-com-prod.sql
  • Get the qsl for that specific table
  • Run the pf.sh script from vitess: bash pf.sh in the specific cluster
  • Created alias for mysql: alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin" - You need to use the admin user.
  • Open the mysql client: mysql -pdomain-com_admin
  • Run the query

Where pf.sh is this:

#!/bin/sh

kubectl port-forward --address localhost "$(kubectl get service --selector="planetscale.com/component=vtctld" -o name | head -n1)" 15000 15999 &
process_id1=$!
kubectl port-forward --address localhost "$(kubectl get service --selector="planetscale.com/component=vtgate,!planetscale.com/cell" -o name | head -n1)" 15306:3306 &
process_id2=$!
sleep 2
echo "You may point your browser to http://localhost:15000, use the following aliases as shortcuts:"
echo 'alias vtctlclient="vtctlclient -server=localhost:15999 -logtostderr"'
echo 'alias mysql="mysql -h 127.0.0.1 -P 15306 -u user"'
echo "Hit Ctrl-C to stop the port forwards"
wait $process_id1
wait $process_id2


Go into backblaze account, download the last snapshot from contabo. Then upload it in the hetzner bucket. Make sure you have the correct folder path: Buckets/brandName-hetzner-vitess /vt/domain-com/-/2021-11-19.000002.dehetznernuremberg-1009888160/. The 2021-11-19.000002.dehetznernuremberg-1009888160 is important, it should contain the same cell name as the cluster: dehetznernuremberg. I think..

cd vitess

kubectl apply -f hetzner/vitess-cluster.yaml

Notes:

  • If no backups are found in the bucket, it won't start, so we need to set initializeBackup to false.
  • Sometimes kubectl doesn't start the vtablet pod, this can be fixed if we copy the yaml to another file and re-run it.

Install vitess client locally(If you don't have it):

wget https://github.com/vitessio/vitess/releases/download/v11.0.1/vitess_11.0.1-92ac1ff_amd64.deb

sudo dpkg -i vitess_11.0.1-92ac1ff_amd64.deb

Check database:

# Port-forward vtctld and vtgate and apply schema and vschema
bash pf.sh &
alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin"
alias vtctlclient="vtctlclient -server localhost:15999 -alsologtostderr"

Pass: `domain-com_admin_brandName2`

# Go to `http://localhost:15000/app/dashboard` to see the dashboard.

mysql -pdomain-com_admin_brandName2

vtctlclient BackupShard -allow_primary domain-com/-

Atm, typeorm doesn't initializez the db, so we need to do it manually, first create it locally and then import it in vitess:

mysqldump -u root -proot test_typeorm > domain-com.sql

mysql -pdomain-com_admin_brandName2 < domain-com.sql

This should be done when we update something, before going in production. We're still waiting for this.

3.1 Backup database

We will have to create a recurring CronJob that creates a backup of the vitess database.

The CronJob should have the following: Workload > CronJobs > Create

  • name: backup-vitess-domain-com
  • schedule: 0 0 * * *
  • container-image: vitess/lite:v12.0.2-mysql80
  • pull policy: IfNotPresent
  • command: /vt/bin/vtctlclient
  • args: -logtostderr --server vt-vtctld-f26eb0bb:15999 BackupShard -allow_primary domain-com/-

Note: When we will have multiple replicas, we can remove the allow_primary.

Be sure to check if the --server vt-vtctld-f26eb0bb matches the current name for that vtctld. To do this, run: kubectl get svc and see the name of vt-vtctld.

4. Gitlab registry

Add it to Storage->Secrets->Create->Registry

Source

Registry:

  • name: registry-gitlab-com
  • url: registry.gitlab.com
  • user: DEPLOY_TOKEN_USER
  • Token: secret

The token was created here with only read_registry access.

5. Install other utilities

  • Loki stack - next post

6. Deploy apps

6.1 Deploy backend(+admin) & frontend

cd deployment_domain-com

kubectl apply -f hetzner/domain.com-backend.yaml
kubectl apply -f hetzner/domain.com-frontend.yaml
kubectl apply -f contabo/domain.com-backend-admin.yaml

6.2 Deploy certificates

We will have to deploy a cluster issues, we will use the staging certificates from let's encrypt. We can use the production ones in production.

Search ClusterIssuer in rancher > Create from YAML: add the yaml like we have it here:

# Example for production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: mail@gmail.com
preferredChain: ""
privateKeySecretRef:
name: letsencrypt-prod
server: https://acme-v02.api.letsencrypt.org/directory
solvers:
- http01:
ingress:
class: nginx
selector: {}

6.3 Deploy ingress for frontend & backend

Add the DNS record in cloudflare first, otherwise the certificate won't be generated.

From rancher UI Service Discovery->Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-staging
kubernetes.io/ingress.class: nginx
name: domain-com-frontend-ingress
namespace: default
spec:
- host: 'dev.domain.com'
http:
paths:
- backend:
service:
name: domain-com-frontend-service
port:
number: 80
path: /app/
pathType: Prefix
tls:
- hosts:
- dev.domain.com
secretName: domain.com-cert # Autogenerated

6.4 Deploy a service

apiVersion: apps/v1
kind: Deployment
metadata:
name: domain-com-backend
labels:
app: domain-com-backend
spec:
replicas: 1
selector:
matchLabels:
app: domain-com-backend
template:
metadata:
labels:
app: domain-com-backend
spec:
imagePullSecrets:
- name: registry-gitlab-com
containers:
- name: domain-com-backend
image: registry.gitlab.com/backend:1.3.0_master_111111
imagePullPolicy: IfNotPresent
ports:
- containerPort: 6060
env:
- name: DB_USER
valueFrom:
secretKeyRef:
name: domain-com-backend-secret
key: db_user
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: domain-com-backend-secret
key: db_password
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: db_host
- name: DB_NAME
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: db_name
- name: DB_LOGGING
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: db_logging
- name: DB_SYNCHRONIZE
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: db_synchronize
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: log_level
- name: JWT_EXPIRES_IN
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: jwt_expires_in
- name: JWT_ALGORITHM
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: jwt_algorithm
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: domain-com-backend-secret
key: jwt_secret
- name: MAILJET_API_KEY
valueFrom:
secretKeyRef:
name: mailjet-api
key: MAILJET_API_KEY
- name: MAILJET_API_SECRET
valueFrom:
secretKeyRef:
name: mailjet-api
key: MAILJET_API_SECRET
- name: FRONTEND_URL
valueFrom:
configMapKeyRef:
name: domain-com-backend-configmap
key: frontend_url
---
apiVersion: v1
kind: Service
metadata:
name: domain-com-backend-service
spec:
selector:
app: domain-com-backend
ports:
- protocol: TCP
port: 6060
targetPort: 6060
---
# Create this first before the deployment
apiVersion: v1
kind: ConfigMap
metadata:
name: domain-com-backend-configmap
data:
db_host: vt-vtgate-41864810
db_name: "domain-com"
db_synchronize: "false"
db_logging: "all"
jwt_expires_in: "200d"
jwt_algorithm: "HS256"
log_level: "debug"
frontend_url: "domain.com"
# the service name
---
# kubectl apply -f mongo-secret.yaml
# Run this first, before deployment
apiVersion: v1
kind: Secret
metadata:
name: domain-com-backend-secret
type: Opaque
# data: This is just like stringData only that it's base64 encoded
# db_user: sXumcm3laZU=
# db_password: eGezc2dvcuZ=
# jwt_secret: sdadas
stringData:
db_user: domain-com_backend
db_password: "domain-com_backend_domain-com"
jwt_secret: "jwt"