Initializing a cluster with rke2 on Hetzner

This document describes all steps that we need to make when we decide to start the production cluster from Hetzner. This contains:

server installation
database
frontend apps
backend apps
ssl
grafana + loki

1 Install servers

We buy the servers from the clould web interface. For each server we need to do the following steps when buying:

Add it to the brandName-net-01 private network(Used to access the nfs storage) In the future, maybe start the cluster on this network.
Add it to the brandName-firewall-01 firewall
Add it to the brandName-01 placement group(this way they won't end up on the same phisical server, so if one fails the others are still up)
Add the public IP to the brandName-firewall-01 fireawall, we have two rules that allow traffic between those servers. This is due to tha fact that we couldn't make it(rke2 cluster, here's smt similar) work on the private addresses.

1.1 Change root pass

After buying a new server, we will receive an email with the root pass, we will connect manually to it and change the pass.

We also need to add it to the inventory of rke2-ansible.

1.2 Local utilities to install and preparations

We need to add the users to the new servers and and install the requirements.

sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository --yes --update ppa:ansible/ansible
sudo apt install ansible

Prepare the key for ansible_noob user which will be used to install all things on nodes.

Generate the key

ssh-keygen -t rsa -b 4096 -C "ansible_noob"

1.3 Add ansible_noob user

Adds the anssible_noob user to all servers and copies the key + makes the user a sudoer.

To run this, you will need sshpass instaled on your PC:

sudo apt-get install sshpass

ansible-playbook -v -i hosts/hetzner/hosts_ansible_noobs ansible_noob.yml

Note: When you want to install a new node, add it to the ansible_noobs group and run the ansible_noob.yml, then comment/remove the hosts from that group.

1.4 Init server - install utilities for rke

Update + upgrade
Add developer users
Nfs server on nfs_servers

ansible-playbook -v -i hosts/hetzner/hosts init_rke2_hetzner.yml

# Or
ansible-playbook -v -i hosts/hetzner/hosts --key-file "~/.ssh/ansible_noob_id_rsa" init_rke2_hetzner.yml

Note: When you want to install a new node, add it to the new_nodes group and run the init.yml, then remove the hosts from that group.

You can test of the nfs works, you can mount it on another server and see if it works:

ssh ansible_noob@SERVER-IP-1

sudo mkdir test
sudo mount 10.112.0.2:/var/nfs/general $(pwd)/test
cd test
touch file
cd ..
sudo umount $(pwd)/test

exit

ssh ansible_noob@SERVER-IP-2

cd /var/nfs/general
ls
# file should be there

1.3 Install RKE2

git clone git@github.com:rancherfederal/rke2-ansible.git

cd rke2-ansible/

ansible-galaxy collection install -r requirements.yml

cd inventory/
ln -s ../../rke2_inventory/hetzner/ hetzner

ansible-playbook site.yml -i inventory/hetzner/hosts.ini

To get the kubeconfig(we can omit this, because we can get it from rancher):

ssh ansible_noob@SERVER-IP-2
sudo cp /etc/rancher/rke2/rke2.yaml .
sudo chown ansible_noob: rke2.yaml
exit

scp ansible_noob@SERVER-IP-2:/home/ansible_noob/rke2.yaml $(pwd)/inventory/hetzner/credentials/

# Edit the server ip
export KUBECONFIG=/path/rke2_inventory/hetzner/credentials/rke2.yaml

kubectl get nodes

1.4 Post RKE2 install

Things that we need to do after RKE2 is installed. This is needed for rancher:

cd .. # get back in the ansible folder
# Make sure that the master node is not commented in the new_nodes section
ansible-playbook -v -i hosts/hetzner/hosts post_rke2.yml

1.5 Install rancher

Source

Install helm on your PC and add the repository + create namesapce for rancher:

# Helm install
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable

kubectl create namespace cattle-system

Install cert-manager:

# If you have installed the CRDs manually instead of with the `--set installCRDs=true` option added to your Helm install command, you should upgrade your CRD resources before upgrading the Helm chart:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.6.1

# See the cert manager pods
kubectl get pods --namespace cert-manager

Install rancher with rancher Certificates, the external certificates will be provided by clouldflare:

helm install rancher rancher-stable/rancher \
  --namespace cattle-system \
  --set hostname=rancher-hetzner.brandName.com \
  --set replicas=3

# To uninstall
helm uninstall rancher

# Wait for it to finish installing:
kubectl -n cattle-system rollout status deploy/rancher

kubectl -n cattle-system get deploy rancher

Get the link for the first setup

echo https://rancher-hetzner.brandName.com/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}')

Open that in a browser and set the password.

If you forget the password, use this.

2 Post install rancher

2.1 Add helm repositories in rancher

In Apps & Marketplace > Repositories -> Create

https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/

The rest can be postponned:

https://charts.helm.sh/stable
https://charts.helm.sh/incubator
https://charts.jetstack.io

2.2 Prepare secrets

From Storage > Secrets > Create > Opaque

For rancher backups: backblaze-brandName-hetzner-rancher, source.

accessKey: KEYID
secretKey: SECRET

For vitess backup: backblaze-brandName-hetzner-vitess, the key should be brandName-hetzner-vitess-key and the value should be this(It looks silly, I know..):

[default]
aws_access_key_id=KEYID
aws_secret_access_key=SECRET

Note: It must be in ~/.aws/credentials format as stated in the docs.

For mailjet: mailjet-api, it should contain two keys:

MAILJET_API_KEY - value
MAILJET_API_SECRET - value

2.2 Install cluster tools

nfs-subdir-external-provisioner provider(this will pop-up when installing the nfs-subdir-external-provisioner, set as default class + set archive to true)
Rancher Backups

These, we can install them when we really need them:

Monitoring: 10Gb - 10d
Alerting Drivers - I'm not sure if we should install this.

NFS install.

Go to Apps & Marketplace > Charts and search for nfs:

Name: nfs-master1-storage

  path: /var/nfs/general
  server: 10.112.0.2 # Master1 private IP
  allowVolumeExpansion: true
  archiveOnDelete: true
  defaultClass: true
  name: nfs-master1-storage

For Rancher backups use the following: Cluster Tools > Rancher Backups

secret: backblaze-brandName-hetzner-rancher
region: eu-central-003
endpoint: s3.eu-central-003.backblazeb2.com
bucket name: brandName-hetzner-rancher

Then go to the Rancher Backups > Backup > Create section and create a recurring backup, everyday at 12 AM: 0 0 * * *(UTC -> 03:00 RO). The name should be backup-rancher-to-backblaze.

Retention: 30

3 Prepare database

All things that are needed for preparing the database env for our apps.

3.1 Vitess

Install the operator for vitess:

kubectl apply -f https://raw.githubusercontent.com/vitessio/vitess/main/examples/operator/operator.yaml

Install vitess:

Before this, you should add a backup inside the bucket, this way it will initizlize. Or comment the initializeBackup: true from vitess. Check this for more info for the initial import:

1. Initial schema import - This should be only one time, we shouldn't need this anymore

Tpyeorm doesn't work with vitess atm, I've opened an issue here, so to initialize the database I did the following:

Created an empty database locally: domain-com-prod-schema
Started the backend and connected to it
Ran mysqldump -d -u root -p domain-com-prod-schema > domain-com-prod.sql
Commented the initializeBackup: true from the vitess cluster, because there is no backup for it.
Started the vitess cluster - todo link - It will auto upload a backup to backblaze.
Uncomment that line, and apply the vitess again. To be sure, I've deleted the cluster and re-deployed with the line uncommented.
Ran the pf.sh script from vitess: bash pf.sh
Created alias for mysql: alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin" - You need to use the admin user.
Imported the schema: mysql -pdomain-com_admin < domain-com-prod.sql.

2. Update database

If you need to add a new table:
Run mysqldump -d -u root -p domain-com-prod-schema > domain-com-prod.sql
Get the qsl for that specific table
Run the pf.sh script from vitess: bash pf.sh in the specific cluster
Created alias for mysql: alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin" - You need to use the admin user.
Open the mysql client: mysql -pdomain-com_admin
Run the query

Where pf.sh is this:

#!/bin/sh

kubectl port-forward --address localhost "$(kubectl get service --selector="planetscale.com/component=vtctld" -o name | head -n1)" 15000 15999 &
process_id1=$!
kubectl port-forward --address localhost "$(kubectl get service --selector="planetscale.com/component=vtgate,!planetscale.com/cell" -o name | head -n1)" 15306:3306 &
process_id2=$!
sleep 2
echo "You may point your browser to http://localhost:15000, use the following aliases as shortcuts:"
echo 'alias vtctlclient="vtctlclient -server=localhost:15999 -logtostderr"'
echo 'alias mysql="mysql -h 127.0.0.1 -P 15306 -u user"'
echo "Hit Ctrl-C to stop the port forwards"
wait $process_id1
wait $process_id2

Go into backblaze account, download the last snapshot from contabo. Then upload it in the hetzner bucket. Make sure you have the correct folder path: Buckets/brandName-hetzner-vitess /vt/domain-com/-/2021-11-19.000002.dehetznernuremberg-1009888160/. The 2021-11-19.000002.dehetznernuremberg-1009888160 is important, it should contain the same cell name as the cluster: dehetznernuremberg. I think..

cd vitess

kubectl apply -f hetzner/vitess-cluster.yaml

Notes:

If no backups are found in the bucket, it won't start, so we need to set initializeBackup to false.
Sometimes kubectl doesn't start the vtablet pod, this can be fixed if we copy the yaml to another file and re-run it.

Install vitess client locally(If you don't have it):

wget https://github.com/vitessio/vitess/releases/download/v11.0.1/vitess_11.0.1-92ac1ff_amd64.deb

sudo dpkg -i vitess_11.0.1-92ac1ff_amd64.deb

Check database:

# Port-forward vtctld and vtgate and apply schema and vschema
bash pf.sh &
alias mysql="mysql -h 127.0.0.1 -P 15306 -u domain-com_admin"
alias vtctlclient="vtctlclient -server localhost:15999 -alsologtostderr"

Pass: `domain-com_admin_brandName2`

# Go to `http://localhost:15000/app/dashboard` to see the dashboard.

mysql -pdomain-com_admin_brandName2

vtctlclient BackupShard -allow_primary domain-com/-

Atm, typeorm doesn't initializez the db, so we need to do it manually, first create it locally and then import it in vitess:

mysqldump -u root -proot test_typeorm > domain-com.sql

mysql -pdomain-com_admin_brandName2 < domain-com.sql

This should be done when we update something, before going in production. We're still waiting for this.

3.1 Backup database

We will have to create a recurring CronJob that creates a backup of the vitess database.

The CronJob should have the following: Workload > CronJobs > Create

name: backup-vitess-domain-com
schedule: 0 0 * * *
container-image: vitess/lite:v12.0.2-mysql80
pull policy: IfNotPresent
command: /vt/bin/vtctlclient
args: -logtostderr --server vt-vtctld-f26eb0bb:15999 BackupShard -allow_primary domain-com/-

Note: When we will have multiple replicas, we can remove the allow_primary.

Be sure to check if the --server vt-vtctld-f26eb0bb matches the current name for that vtctld. To do this, run: kubectl get svc and see the name of vt-vtctld.

4. Gitlab registry

Add it to Storage->Secrets->Create->Registry

Source

Registry:

name: registry-gitlab-com
url: registry.gitlab.com
user: DEPLOY_TOKEN_USER
Token: secret

The token was created here with only read_registry access.

5. Install other utilities

Loki stack - next post

6. Deploy apps

6.1 Deploy backend(+admin) & frontend

cd deployment_domain-com

kubectl apply -f hetzner/domain.com-backend.yaml
kubectl apply -f hetzner/domain.com-frontend.yaml
kubectl apply -f contabo/domain.com-backend-admin.yaml

6.2 Deploy certificates

We will have to deploy a cluster issues, we will use the staging certificates from let's encrypt. We can use the production ones in production.

Search ClusterIssuer in rancher > Create from YAML: add the yaml like we have it here:

# Example for production
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: mail@gmail.com
    preferredChain: ""
    privateKeySecretRef:
      name: letsencrypt-prod
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - http01:
        ingress:
          class: nginx
      selector: {}

6.3 Deploy ingress for frontend & backend

Add the DNS record in cloudflare first, otherwise the certificate won't be generated.

From rancher UI Service Discovery->Ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-staging
    kubernetes.io/ingress.class: nginx
  name: domain-com-frontend-ingress
  namespace: default
spec:
  - host: 'dev.domain.com'
    http:
      paths:
      - backend:
          service:
            name: domain-com-frontend-service
            port:
              number: 80
        path: /app/
        pathType: Prefix
  tls:
  - hosts:
    - dev.domain.com
    secretName: domain.com-cert # Autogenerated

6.4 Deploy a service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: domain-com-backend
  labels:
    app: domain-com-backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: domain-com-backend
  template:
    metadata:
      labels:
        app: domain-com-backend
    spec:
      imagePullSecrets:
        - name: registry-gitlab-com
      containers:
      - name: domain-com-backend
        image: registry.gitlab.com/backend:1.3.0_master_111111
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 6060
        env:
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: domain-com-backend-secret
              key: db_user
        - name: DB_PASSWORD
          valueFrom: 
            secretKeyRef:
              name: domain-com-backend-secret
              key: db_password
        - name: DB_HOST
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: db_host
        - name: DB_NAME
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: db_name
        - name: DB_LOGGING
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: db_logging
        - name: DB_SYNCHRONIZE
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: db_synchronize
        - name: LOG_LEVEL
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: log_level
        - name: JWT_EXPIRES_IN
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: jwt_expires_in
        - name: JWT_ALGORITHM
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: jwt_algorithm
        - name: JWT_SECRET
          valueFrom: 
            secretKeyRef:
              name: domain-com-backend-secret
              key: jwt_secret
        - name: MAILJET_API_KEY
          valueFrom: 
            secretKeyRef:
              name: mailjet-api
              key: MAILJET_API_KEY
        - name: MAILJET_API_SECRET
          valueFrom: 
            secretKeyRef:
              name: mailjet-api
              key: MAILJET_API_SECRET
        - name: FRONTEND_URL
          valueFrom: 
            configMapKeyRef:
              name: domain-com-backend-configmap
              key: frontend_url
---
apiVersion: v1
kind: Service
metadata:
  name: domain-com-backend-service
spec:
  selector:
    app: domain-com-backend
  ports:
    - protocol: TCP
      port: 6060
      targetPort: 6060
---
# Create this first before the deployment
apiVersion: v1
kind: ConfigMap
metadata:
  name: domain-com-backend-configmap
data:
  db_host: vt-vtgate-41864810
  db_name: "domain-com"
  db_synchronize: "false"
  db_logging: "all"
  jwt_expires_in: "200d"
  jwt_algorithm: "HS256"
  log_level: "debug"
  frontend_url: "domain.com"
# the service name
---
# kubectl apply -f mongo-secret.yaml
# Run this first, before deployment
apiVersion: v1
kind: Secret
metadata:
    name: domain-com-backend-secret
type: Opaque
# data: This is just like stringData only that it's base64 encoded
#     db_user: sXumcm3laZU=
#     db_password: eGezc2dvcuZ=
#     jwt_secret: sdadas
stringData:
    db_user: domain-com_backend
    db_password: "domain-com_backend_domain-com"
    jwt_secret: "jwt"

1 Install servers​

1.1 Change root pass​

1.2 Local utilities to install and preparations​

1.3 Add ansible_noob user​

1.4 Init server - install utilities for rke​

1.3 Install RKE2​

1.4 Post RKE2 install​

1.5 Install rancher​

2 Post install rancher​

2.1 Add helm repositories in rancher​

2.2 Prepare secrets​

2.2 Install cluster tools​

3 Prepare database​

3.1 Vitess​

3.1 Backup database​

4. Gitlab registry​

5. Install other utilities​

6. Deploy apps​

6.1 Deploy backend(+admin) & frontend​

6.2 Deploy certificates​

6.3 Deploy ingress for frontend & backend​

6.4 Deploy a service​