Multi-Cluster Management¶
Duration: 45 minutes (20 minutes theory + 25 minutes lab)
Introduction¶
Multi-cluster strategies distribute workloads across multiple Kubernetes clusters for high availability, disaster recovery, geographic distribution, and environment isolation.
Use Cases:
- High availability across regions
- Disaster recovery and backup
- Dev/staging/production separation
- Geographic data residency
- Load distribution
- Blue/green at cluster level
Multi-Cluster Patterns¶
1. Federation Pattern¶
Single control plane manages multiple clusters:
- Central policy management
- Cross-cluster scheduling
- Unified configuration
2. Independent Clusters¶
Separate clusters with service mesh connectivity:
- Cluster independence
- Service-to-service communication
- Traffic routing across clusters
3. Hub and Spoke¶
Central hub cluster manages spoke clusters:
- Central monitoring
- Policy distribution
- Workload orchestration
Kubernetes Context Management¶
View and Switch Contexts¶
# List contexts
kubectl config get-contexts
# Current context
kubectl config current-context
# Switch context
kubectl config use-context cluster-prod
# View full config
kubectl config view
Configure Multiple Clusters¶
# ~/.kube/config
apiVersion: v1
kind: Config
clusters:
- cluster:
server: https://cluster1.example.com
certificate-authority-data: <ca-cert-1>
name: cluster-1
- cluster:
server: https://cluster2.example.com
certificate-authority-data: <ca-cert-2>
name: cluster-2
contexts:
- context:
cluster: cluster-1
user: admin-1
namespace: default
name: context-1
- context:
cluster: cluster-2
user: admin-2
namespace: default
name: context-2
current-context: context-1
users:
- name: admin-1
user:
client-certificate-data: <cert-1>
client-key-data: <key-1>
- name: admin-2
user:
client-certificate-data: <cert-2>
client-key-data: <key-2>
Quick Context Switching¶
# Create aliases
alias k1='kubectl --context=context-1'
alias k2='kubectl --context=context-2'
# Use kubectx tool
kubectx context-1
kubectx context-2
# List contexts
kubectx
Cross-Cluster Service Discovery¶
DNS-Based Discovery¶
# Service in cluster-1
apiVersion: v1
kind: Service
metadata:
name: api
namespace: production
spec:
selector:
app: api
ports:
- port: 8080
type: ClusterIP
Access from another cluster:
# ExternalName Service in cluster-2
apiVersion: v1
kind: Service
metadata:
name: api-cluster1
namespace: production
spec:
type: ExternalName
externalName: api.production.cluster1.example.com
Multi-Cluster Services with Cilium¶
apiVersion: cilium.io/v2alpha1
kind: CiliumClusterwideNetworkPolicy
metadata:
name: allow-cross-cluster
spec:
endpointSelector:
matchLabels:
app: frontend
egress:
- toEndpoints:
- matchLabels:
io.cilium.k8s.policy.cluster: cluster-2
app: api
Enable Cluster Mesh:
# Install Cilium
cilium install
# Enable cluster mesh on both clusters
cilium clustermesh enable --context cluster-1
cilium clustermesh enable --context cluster-2
# Connect clusters
cilium clustermesh connect --context cluster-1 --destination-context cluster-2
# Verify
cilium clustermesh status --context cluster-1
Service Export/Import¶
# In cluster-1: Export service
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
name: api
namespace: production
---
# In cluster-2: Import service
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceImport
metadata:
name: api
namespace: production
spec:
type: ClusterSetIP
ports:
- port: 8080
protocol: TCP
GitOps for Multi-Cluster¶
Flux Multi-Cluster Setup¶
# clusters/cluster-1/flux-system/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: fleet-infra
path: ./apps/cluster-1
prune: true
---
# clusters/cluster-2/flux-system/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: fleet-infra
path: ./apps/cluster-2
prune: true
Repository structure:
fleet-infra/
├── apps/
│ ├── base/
│ │ ├── deployment.yaml
│ │ └── service.yaml
│ ├── cluster-1/
│ │ ├── kustomization.yaml
│ │ └── values.yaml
│ └── cluster-2/
│ ├── kustomization.yaml
│ └── values.yaml
└── clusters/
├── cluster-1/
│ └── flux-system/
└── cluster-2/
└── flux-system/
Disaster Recovery¶
Backup with Velero¶
# Install Velero
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.8.0 \
--bucket k8s-backups \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1 \
--secret-file ./credentials-velero
# Create backup
velero backup create production-backup \
--include-namespaces production \
--default-volumes-to-fs-backup
# Schedule regular backups
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production
# Restore to another cluster
velero restore create --from-backup production-backup \
--namespace-mappings production:production
Backup Strategy¶
apiVersion: velero.io/v1
kind: Backup
metadata:
name: full-cluster-backup
namespace: velero
spec:
includedNamespaces:
- '*'
excludedNamespaces:
- kube-system
- kube-public
- velero
includedResources:
- '*'
excludedResources:
- events
- events.events.k8s.io
labelSelector:
matchLabels:
backup: "true"
ttl: 720h0m0s # 30 days
storageLocation: default
volumeSnapshotLocations:
- default
Traffic Management¶
Global Load Balancing¶
Using external DNS and geo-routing:
apiVersion: v1
kind: Service
metadata:
name: frontend
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/geo-region: us-east-1
spec:
type: LoadBalancer
selector:
app: frontend
ports:
- port: 80
targetPort: 8080
Istio Multi-Cluster¶
# Install Istio with multi-cluster support
istioctl install --set profile=default \
--set values.global.meshID=mesh1 \
--set values.global.multiCluster.clusterName=cluster-1 \
--set values.global.network=network1
# Create remote secret for cluster-2
istioctl create-remote-secret \
--context=cluster-2 \
--name=cluster-2 | \
kubectl apply -f - --context=cluster-1
# Deploy sample app
kubectl label namespace default istio-injection=enabled
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml
Centralized Monitoring¶
Prometheus Federation¶
# Central Prometheus in hub cluster
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'federate-cluster-1'
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="kubernetes-pods"}'
static_configs:
- targets:
- 'prometheus.cluster-1.example.com:9090'
labels:
cluster: 'cluster-1'
- job_name: 'federate-cluster-2'
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="kubernetes-pods"}'
static_configs:
- targets:
- 'prometheus.cluster-2.example.com:9090'
labels:
cluster: 'cluster-2'
Grafana Multi-Cluster Dashboard¶
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Cluster-1
type: prometheus
url: http://prometheus.cluster-1.svc:9090
isDefault: false
editable: true
- name: Cluster-2
type: prometheus
url: http://prometheus.cluster-2.svc:9090
isDefault: false
editable: true
- name: All-Clusters
type: prometheus
url: http://prometheus-federated.svc:9090
isDefault: true
editable: true
Multi-Cluster Deployment Tools¶
Argo CD ApplicationSet¶
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: multi-cluster-app
namespace: argocd
spec:
generators:
- list:
elements:
- cluster: cluster-1
url: https://cluster1.example.com
region: us-east-1
- cluster: cluster-2
url: https://cluster2.example.com
region: us-west-2
template:
metadata:
name: 'app-{{cluster}}'
spec:
project: default
source:
repoURL: https://github.com/org/manifests
targetRevision: HEAD
path: apps/{{cluster}}
destination:
server: '{{url}}'
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Rancher Multi-Cluster Management¶
Rancher provides UI and API for multi-cluster management:
# Deploy Rancher
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm install rancher rancher-latest/rancher \
--namespace cattle-system \
--create-namespace \
--set hostname=rancher.example.com
# Import existing cluster
# UI: Add Cluster > Import Existing > Run generated command on target cluster
Best Practices¶
- Consistent Configuration - Use GitOps for all clusters
- Namespace Naming - Use same namespace names across clusters
- Resource Labels - Label resources with cluster name
- Secrets Management - Use external secrets operator
- Monitoring - Centralized monitoring with federation
- Backup Strategy - Regular automated backups
- Network Connectivity - Ensure cross-cluster networking
- Cost Management - Monitor costs per cluster
- Access Control - Separate RBAC per cluster
- Documentation - Document cluster purposes and configs
Failure Scenarios¶
Cluster Failover¶
# Detect failure (health check)
kubectl --context=cluster-1 get nodes
# Update DNS to point to cluster-2
# This is typically done by load balancer or DNS provider
# Restore services in cluster-2 from backup
velero restore create --from-backup production-backup
Data Replication¶
Use external data replication:
- Databases: Native replication (PostgreSQL streaming, MySQL replication)
- Object Storage: S3 cross-region replication
- Volumes: Cloud provider volume replication
Common Pitfalls¶
- Network Latency - Cross-cluster calls are slower
- Data Consistency - No distributed transactions
- Cost - Multiple clusters = higher costs
- Complexity - More moving parts
- Secret Synchronization - Keep secrets in sync
- Version Skew - Keep Kubernetes versions aligned
Multi-Cluster Decision Matrix¶
| Factor | Single Cluster | Multi-Cluster |
|---|---|---|
| Complexity | Low | High |
| Availability | Single point of failure | High availability |
| Cost | Lower | Higher |
| Latency | Low | Higher cross-cluster |
| Isolation | Namespace-based | Cluster-based |
| Management | Simple | Complex |
| Disaster Recovery | Backup/restore | Active-active |
Key takeaways¶
- Multiple clusters provide stronger isolation for teams, environments, and regulatory boundaries than namespaces alone
- kubectl contexts allow you to switch between clusters quickly; tools like kubectx make this even easier
- GitOps with Flux or Argo CD is the recommended way to manage deployments across many clusters consistently
- Disaster recovery requires a tested backup strategy — tools like Velero can back up and restore cluster state
- Centralised monitoring and logging across clusters is essential for visibility and incident response
Check your understanding¶
- What kubectl command switches the active cluster context?
- What are two reasons you might run separate clusters instead of using namespaces?
- Which Velero command initiates a cluster backup?
- How does a GitOps approach simplify managing multiple clusters?
- What is the trade-off between running workloads in multiple clusters vs. a single cluster with multiple namespaces?
Solution
kubectl config use-context <context-name>- Stronger security isolation (blast radius containment), compliance/regulatory requirements, separate teams with different release cadences, or geographic distribution
velero backup create <backup-name>- Each cluster has its own Flux installation pointing at the same Git repository; updating a manifest in Git automatically reconciles all clusters without needing direct cluster access
- Multiple clusters provide stronger isolation and fault boundaries but increase operational complexity and cost; a single cluster with namespaces is simpler to manage but offers weaker isolation
Hands-on¶
Apply the concepts from this section in the lab exercises.
Next section¶
Congratulations — you've completed Part 2 of the workshop!
Return to: Part 2 Overview