Autoscaling¶
Duration: 45 minutes (20 minutes theory + 25 minutes lab)
Introduction¶
Kubernetes can automatically scale applications based on demand, optimizing resource usage and ensuring performance.
Three Types of Autoscaling:
- Horizontal Pod Autoscaler (HPA) - Scales number of pods
- Vertical Pod Autoscaler (VPA) - Adjusts CPU/memory requests
- Cluster Autoscaler - Adds/removes nodes
Horizontal Pod Autoscaler (HPA)¶
Automatically scales pods based on metrics.
Metrics Server¶
Required for HPA to work:
# Install metrics-server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify
kubectl top nodes
kubectl top pods
Basic HPA Example¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
How it works:
- HPA checks metric every 15 seconds (default)
- Calculates desired replicas:
ceil(current * (current_metric / target_metric)) - Scales pods if needed
- Waits for stabilization before next scale event
CPU-Based Autoscaling¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cpu-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Memory-Based Autoscaling¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Multiple Metrics¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
Note: HPA uses the metric that requires the most replicas.
Custom Metrics¶
Scale based on application-specific metrics.
Prometheus Adapter¶
Expose Prometheus metrics to HPA:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--set prometheus.url=http://prometheus-server.monitoring.svc \
--namespace monitoring
Configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
data:
config.yaml: |
rules:
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
HPA using custom metric:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
KEDA - Event-Driven Autoscaling¶
Scale based on external metrics (queues, databases, etc.).
Installing KEDA¶
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace
Scaling Based on Queue Length¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-scaler
spec:
scaleTargetRef:
name: worker-deployment
minReplicaCount: 1
maxReplicaCount: 30
triggers:
- type: rabbitmq
metadata:
queueName: tasks
queueLength: "5"
host: amqp://rabbitmq.default.svc.cluster.local:5672
Scaling Based on Kafka Lag¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaler
spec:
scaleTargetRef:
name: consumer-deployment
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.kafka:9092
consumerGroup: my-consumer-group
topic: events
lagThreshold: "100"
Scaling Based on Cron¶
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cron-scaler
spec:
scaleTargetRef:
name: batch-job
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: cron
metadata:
timezone: America/New_York
start: 0 8 * * *
end: 0 18 * * *
desiredReplicas: "10"
KEDA Supports 50+ scalers:
- AWS SQS, CloudWatch
- Azure Queue, Service Bus
- GCP Pub/Sub
- Prometheus
- PostgreSQL
- MongoDB
- And many more
Vertical Pod Autoscaler (VPA)¶
Automatically adjusts CPU/memory requests.
Installing VPA¶
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
VPA Example¶
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto" # Or "Off", "Initial", "Recreate"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1
memory: 1Gi
controlledResources: ["cpu", "memory"]
Update Modes:
- Off - Only recommendations, no changes
- Initial - Set on pod creation only
- Recreate - Delete and recreate pods with new requests
- Auto - Update in-place (experimental)
Note: VPA and HPA on the same metric can conflict. Use VPA for CPU/memory requests, HPA for replica count.
Cluster Autoscaler¶
Automatically adds or removes nodes based on pod resource requests.
How It Works¶
- Pods pending due to insufficient resources → Add nodes
- Node underutilized for 10+ minutes → Remove node
- Respects PodDisruptionBudgets during scale-down
Cloud Provider Integration¶
# AWS
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-config
data:
config.yaml: |
autoDiscovery:
clusterName: my-cluster
tags:
- k8s.io/cluster-autoscaler/enabled
- k8s.io/cluster-autoscaler/my-cluster
Node Autoscaler Deployment¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
Note: kind clusters don't support Cluster Autoscaler since nodes are containers.
Autoscaling Best Practices¶
- Set resource requests - HPA and CA use requests for decisions
- Readiness probes - Required for HPA to know when pods are ready
- PodDisruptionBudgets - Prevent too many pods down during scale-down
- Gradual scaling - Use behavior policies to avoid thrashing
- Monitor carefully - Watch for scaling loops
- Cost awareness - Set max replicas/nodes to control costs
- Test scaling - Load test to verify behavior
- Stabilization windows - Allow time between scale events
- Right-size requests - VPA can help find optimal resource requests
- Multiple metrics - Use custom metrics for accuracy
PodDisruptionBudget¶
Ensure availability during voluntary disruptions:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 2
# Or use maxUnavailable: 1
selector:
matchLabels:
app: myapp
Prevents:
- Cluster Autoscaler from draining nodes with critical pods
- kubectl drain from removing too many pods
- Disruptions during HPA scale-down
Scaling Calculations¶
HPA Formula¶
Example:
- Current: 4 replicas
- Current CPU: 90%
- Target CPU: 70%
- Desired = ceil(4 * (90 / 70)) = ceil(5.14) = 6 replicas
Multiple Metrics¶
HPA selects the highest replica count from all metrics:
- CPU suggests 6 replicas
- Memory suggests 4 replicas
- Custom metric suggests 8 replicas
- Result: 8 replicas (maximum)
Monitoring Autoscaling¶
# HPA status
kubectl get hpa myapp-hpa
# HPA details
kubectl describe hpa myapp-hpa
# HPA events
kubectl get events --field-selector involvedObject.name=myapp-hpa
# VPA recommendations
kubectl describe vpa myapp-vpa
# Cluster Autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler
Advanced HPA Behavior¶
Scale-Down Stabilization¶
Prevent flapping when metrics oscillate:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scale down
policies:
- type: Percent
value: 50
periodSeconds: 60 # Max 50% pods removed per minute
Scale-Up Policies¶
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100
periodSeconds: 30 # Double pods every 30 sec
- type: Pods
value: 4
periodSeconds: 60 # Add max 4 pods per minute
selectPolicy: Max # Use policy that scales fastest
Key takeaways¶
- HPA scales the number of Pod replicas based on CPU, memory, or custom metrics automatically
- VPA adjusts resource requests and limits for individual Pods, optimising resource usage without changing replica count
- Cluster Autoscaler adds or removes nodes when Pods cannot be scheduled or nodes are underutilised
- KEDA enables event-driven autoscaling from external sources like queues, databases, and HTTP traffic
- PodDisruptionBudgets protect availability during scaling events by limiting the number of Pods that can be removed simultaneously
Check your understanding¶
- What is the difference between HPA and VPA?
- What metrics does HPA use by default, and how can you extend it?
- At what point does the Cluster Autoscaler add a new node?
- What Kubernetes resource prevents too many Pods of a deployment from being evicted at once?
- Why might you use KEDA instead of the standard HPA?
Solution
- HPA scales the number of replicas horizontally; VPA adjusts the CPU/memory resource requests of existing Pods vertically
- CPU utilisation and memory utilisation by default; you can extend it with custom metrics via the Custom Metrics API or with KEDA
- When a Pod cannot be scheduled because no existing node has sufficient available resources
- PodDisruptionBudget (PDB)
- KEDA supports scaling to zero and reacting to external event sources (e.g. queue depth, database rows) that the standard HPA cannot natively handle
Hands-on¶
Apply the concepts from this section in the lab exercises.
Next section¶
Once you've reviewed the content and completed the lab, proceed to the next section.