Configuring kubriX for High Availability
This document explains how to configure kubriX for high availability (HA).
It outlines which Helm chart values must be adjusted and identifies components that are not designed for high availability.
High Availability vs. Restartability
High Availability (HA) ensures that your service continues to operate during common failure scenarios such as:
- Node drains or crashes
- Rolling updates
- Availability Zone (AZ) outages
With only a single replica, there will always be downtime when that pod is unavailable - for example, during rescheduling, image pulling, initialization, or when the underlying node fails.
However, depending on your service level agreements (SLAs), a single replica might still be sufficient for some components, especially if:
- The service is not required to be continuously available, but only when users actively access it.
- The component performs background or asynchronous processing, where temporary downtime does not impact the user experience.
Taking these considerations into account, the following sections describe the recommended configuration for a highly available kubriX control plane and kubriX data plane.
Three or two replicas
There is an excellent blog article explaining why three replicas are better than two replicas: https://sookocheff.com/post/kubernetes/why-three-replicas-are-better-than-two/
Observability
Grafana
Grafana supports scaling, as long as you use an external database and configure alerting to use the unified_alerting feature.
We use cnpg for creating the external database. This assumes you have the correct secrets in vault so that the external secrets can fetch the required secrets.
Additional docs:
- https://grafana.com/docs/grafana/latest/setup-grafana/set-up-for-high-availability/
- https://grafana.com/docs/grafana/latest/alerting/set-up/configure-high-availability/
- https://github.com/grafana/helm-charts/tree/main/charts/grafana#high-availability-for-unified-alerting
To summarize, this is a valid configuration for high availability configuration:
grafana:
replicas: 2
# headless service for https://github.com/grafana/helm-charts/tree/main/charts/grafana#high-availability-for-unified-alerting
headlessService: true
grafana.ini:
unified_alerting:
enabled: true
ha_peers: "{{ .Release.Name }}-headless:9094"
ha_listen_address: ${POD_IP}:9094
ha_advertise_address: ${POD_IP}:9094
rule_version_record_limit: "5"
alerting:
enabled: false
# use shared database for persistence instead of a volume
persistence:
enabled: false
env:
GF_DATABASE_TYPE: postgres
# Tell the chart to load env vars from our Secret for grafana db
# see https://grafana.com/docs/grafana/latest/setup-grafana/configure-grafana/#database
# this will be created by an external-secret and contains:
# GF_DATABASE_PASSWORD
# GF_DATABASE_HOST
# GF_DATABASE_NAME
# GF_DATABASE_USER
envFromSecrets:
- name: grafana-env-secret
optional: true
- name: grafana-db
optional: true
# create shared postgresql db
cluster:
type: postgresql
mode: standalone
version:
postgresql: "16"
cluster:
instances: 3
monitoring:
enabled: true
superuserSecret: cnpg-superuser-secret
initdb:
database: grafana
secret:
name: cnpg-grafana-secret
roles:
- name: grafana
ensure: present
comment: grafana-admin-user
login: true
inherit: true
superuser: true
createdb: true
passwordSecret:
name: cnpg-grafana-secret
annotations:
argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
argocd.argoproj.io/sync-wave: "-1"
backups:
enabled: false
K8s-monitoring
-
alloy-operator supports scaling, as long as leadElection is set to true (default)
-
alloy-metrics supports scaling, see Grafana docs
-
kube-state-metrics supports scaling, as long as you set
discoveryType: service. See k8s-monitoring docs
To summarize, this is a valid high availability configuration:
k8s-monitoring:
alloy-operator:
replicaCount: 2
alloy-metrics:
controller:
replicas: 2
clusterMetrics:
kube-state-metrics:
discoveryType: service
replicas: 2
Components not designed for multiple replicas
- alloy-singleton: is does not support scaling, because otherwise ClusterEvents would get retrieved multiple times.
Loki
With Loki simple-scalable you can easily scale out all Loki components to be high available.
loki:
commonConfig:
replication_factor: 3 # needs to be the same number as ingesters to write to and read from
backend:
replicas: 3
read:
replicas: 3
write:
replicas: 3
gateway:
replicas: 2
resultsCache:
replicas: 2
Mimir
-
nginx supports scaling, see example
-
distributor supports scaling, it is completely stateless. see example and documentation
-
query-frontend supports scaling. Scalability is limited by the configured number of workers per querier when you don't use the query-scheduler, which is active in our configuration. See also the official documentation and example.
-
ruler supports scaling. see example
-
compactor should be able to scale up according to the documentation but also in the example it is not scaled up, so we leave it for now.
-
querier supports scaling and has two replicas per default, see values
-
query_scheduler supports scaling and has two replicas per default, see values
-
ingester and store-gateway create zone-aware replication per default, see documentation
Components not designed for multiple replicas
-
Overrides-exporter: don't scale the overrides exporter! The metrics emitted by the overrides-exporter have high cardinality. It’s recommended to run only a single replica of the overrides-exporter to limit that cardinality. see documentation
-
rollout-operator is fixed to 1 replica, because it doesn't make any sense to scale it. see deployment spec
-
Alertmanager: makes no sense just with one tenant, because there is tenant shardeing implemented. see documentation
To summarize, this is a valid high availability configuration additionally to the default values:
nginx:
replicas: 2
distributor:
replicas: 2
query_frontend:
replicas: 2
ruler:
replicas: 2
Tempo
Currently there is no HA setup defined for Tempo. We are working constantly to extend our HA setup documents.
Delivery
Kargo
Kargo supports scaling most of its components out of the box. You just need to include the Kargo values-ha-enabled-prime.yaml.
However, the controller and manager-controller are hard coded to replicas: 1. You need to switch to a Distributed Architecture to achieve overall scalability.
Crossplane
It is possible to run multiple replicas of the crossplane core pods and rbac manager pods, as long as leader election is turned on (by default it is turned on). Details in Crossplane documentation.
Unfortunately PodDisruptionBudgets are not implemented yet, and currently not planned.
It is also possible to scale out crossplane providers (ootb we integrate keycloak, grafana and vault) as long as the leader election is implemented in the the provider and as long as it is enabled! If leader election is not implemented in the provider or not enabled, then the provider pod will consume 100% CPU!
While crossplane architects emphasize that additional replicas are not really needed and the leader election often takes more time than restarting the pods, there are definitely also arguments for implementing HA with leader election: https://github.com/gofogo/k8s-sigs-external-dns-fork/blob/4a039d1edc2cb2b29ffd48d137ec2d53bda4e0ae/docs/proposal/001-leader-election#use-cases
So it should be carefully decided if multiple replicas of crossplane and crossplane providers really make sense in your environment.
KubeVirt
The virt-operator runs with 2 replicas per default with leader-election. (see https://kubevirt.io/monitoring/runbooks/NoLeadingVirtOperator.html)
Also the KubeVirt CR defaults to two replicas, which means it creates 2 instances of the virt-controller. The virt-api deployment gets scaled based on the available nodes.
CDI (Containerized-Data-Importer)
Technically it is possible to scale the CDI resources via the CDI CustomResourse properties uploadProxyReplicas, apiServerReplicas and deploymentReplicas . However, currently we do not see the benefit to have multiple replicas.
The cdi-operator itself get shipped with one replica out-of-the-box from the original project.
see https://github.com/kubevirt/containerized-data-importer/issues/2560
KubeVirt-Manager
KubeVirt-Manager comes out-of-the-box with a hardcoded replica of one. Unfortunately there is currently no evidence that it can get scaled out.
Security
External-Secrets
External-Secrets HA configuration is implemented via a leader-election (same as crossplane). That means just one replica is doing the work, the others are hot-standby.
webhook can be scaled out without leader-election
cert-controller has also a leader-election feature-flag, but there is an open issue that this flag cannot be enable via the official helm chart via an explicit attribute. However, it can be set via extraArgs.
Attention: if you need to have active-active setup - it is still possible, but things are going to get complicated really fast. you will need to set up controller classes and make sure each secret store gets a controller class assigned to it in a round robin manner with a webhook. And if you do it - bear in mind that any misconfiguration will cause external-secrets as a whole to stop operating.
Kyverno
All components of Kyverno can be scaled out. However, there are some like reports-controller and background-controller which are stateful and implement a leader-election (as in external-secrets or crossplane), and sometimes just some functionality in admission-controller or cleanup-controller like certificate and webhook management uses a leader-election and some not.
Additional Docs: https://kyverno.io/docs/high-availability/
Velero
Currently it is not supported to run velero with multiple replicas. For File System Backup there will be a Node Agent (DaemonSet) deployed. For the etcd Backup there is a Deployment with a hard coded single replica.
There is an issue where a HA requirement is in discussion, but not implemented yet.
General
Ingress-Nginx
Ingress-Nginx Controller supports scaling out without any restrictions. See values-ha-enabled-prime.yaml.
External-DNS
In the community there are currently some concerns against and warnings running multiple replicas of external-dns and so it is hard coded to 1 replica. Therefor we do not suggest to run multiple replicas. Since register dns entries is an asynchronous process anyways, it shouldn't harm if external-dns deployment is not HA.
Since there are also good arguments for enabling multiple replicas we will keep an eye on the open discussion and support external-dns HA configuration as soon as it is available in the upstream project.
CNPG
The CloudNative-PG Controller can be scaled out, but only one instance does the work (leader-election).
Open issues:
- pgadmin4 is missing
topologySpreadContraints, waiting for https://github.com/rowanruseler/helm-charts/pull/328