Lab: Autoscaling¶
Duration: 25 minutes
Objectives¶
- Install and verify Metrics Server
- Deploy an application with resource requests
- Configure a HorizontalPodAutoscaler
- Generate CPU load and observe scale-up
- Tune HPA scale-down behavior
- Understand why resource requests matter
Prerequisites¶
- Kind cluster running
- kubectl configured and working
- Basic understanding of Deployments and resource requests
Tasks¶
Task 1: Install Metrics Server¶
Install Metrics Server and make it work in kind.
Requirements:
- Apply the upstream Metrics Server manifest
- Patch the Metrics Server Deployment for kind TLS behavior
- Verify
kubectl top nodesworks - Verify
kubectl top pods --all-namespacesworks
Hint
Apply the manifest and patch the args:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch deployment metrics-server -n kube-system \
--type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
Wait for it to be ready:
Solution
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch deployment metrics-server -n kube-system \
--type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'
kubectl rollout status deployment/metrics-server -n kube-system
kubectl top nodes
kubectl top pods --all-namespaces
Expected result: kubectl top nodes shows node CPU and memory usage. kubectl top pods shows Pod metrics across all namespaces.
Task 2: Deploy a CPU-Bound Application¶
Deploy an application that can consume CPU when requested.
Requirements:
- Namespace:
autoscaling-lab - Deployment name:
php-apache - Image:
registry.k8s.io/hpa-example - Label:
app=php-apache - Replicas: 1
- Container port: 80
- CPU request:
200m - CPU limit:
500m - Memory request:
128Mi - Memory limit:
256Mi - Service name:
php-apache - Service port: 80
Hint
Create the namespace first:
kubectl create namespace autoscaling-lab
kubectl config set-context --current --namespace=autoscaling-lab
The image registry.k8s.io/hpa-example runs a simple PHP app that computes square roots in a loop when requested. The important thing is to specify CPU requests so that the HPA can calculate utilization.
Solution
kubectl create namespace autoscaling-lab
kubectl config set-context --current --namespace=autoscaling-lab
Create php-apache-deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
labels:
app: php-apache
spec:
replicas: 1
selector:
matchLabels:
app: php-apache
template:
metadata:
labels:
app: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
Create php-apache-service.yaml:
apiVersion: v1
kind: Service
metadata:
name: php-apache
spec:
selector:
app: php-apache
ports:
- port: 80
targetPort: 80
Apply and verify:
Task 3: Create a CPU-Based HPA¶
Create a HorizontalPodAutoscaler for the application.
Requirements:
- HPA name:
php-apache - Target Deployment:
php-apache - Minimum replicas: 1
- Maximum replicas: 6
- Target CPU utilization: 50%
- Use
autoscaling/v2 - Verify the HPA can read CPU metrics
Hint
The autoscaling/v2 spec uses a metrics array:
After applying, run:
The HPA status shows current CPU utilization and desired replicas.
Solution
Create php-apache-hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Apply and verify:
Expected output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 1%/50% 1 6 1 30s
The current CPU usage should be very low at this point (no traffic yet).
Task 4: Generate Load and Watch Scale-Up¶
Create traffic that pushes CPU above the HPA target.
Requirements:
- Run a temporary load generator Pod
- Continuously request the
php-apacheService - Watch HPA desired replicas increase
- Confirm the Deployment scales above 1 replica
- Check current CPU usage with
kubectl top pods
Hint
Run a busy loop in a Pod that sends HTTP requests:
kubectl run load-generator --image=busybox:1.36 --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://php-apache; done"
Watch the HPA in a separate terminal:
Also check deployment replicas and Pod CPU:
Solution
kubectl run load-generator --image=busybox:1.36 --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://php-apache; done"
kubectl get hpa php-apache --watch
kubectl get deployment php-apache
kubectl top pods
Expected result: The HPA target rises above 50%, desired replicas increase from 1 toward 6, and kubectl top pods shows higher CPU usage across scaled Pods.
Task 5: Tune HPA Behavior¶
Update the HPA to scale up quickly and scale down more slowly.
Requirements:
- Keep min replicas at 1 and max replicas at 6
- Keep CPU target at 50%
- Add scale-up behavior with no stabilization window
- Add scale-down behavior with a 120 second stabilization window
- Limit scale-down to 50% per minute
- Verify the behavior appears in the HPA YAML
Hint
Add a behavior block to the HPA spec:
Solution
Update php-apache-hpa.yaml:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
Apply and inspect:
Task 6: Stop Load and Observe Scale-Down¶
Stop the load generator and observe HPA stabilization.
Requirements:
- Delete the load generator Pod
- Watch CPU usage drop
- Observe that scale-down is slower than scale-up
- Confirm the Deployment eventually returns toward the minimum replica count
Hint
Delete the load generator:
Then watch the HPA and Deployment:
Because of the 120-second stabilization window, the Deployment will not scale down immediately.
Solution
kubectl delete pod load-generator --ignore-not-found
kubectl get hpa php-apache --watch
kubectl get deployment php-apache --watch
Expected result: CPU usage drops quickly, but replicas reduce more slowly because of the 120-second scale-down stabilization window. The Deployment will eventually return toward 1 replica.
Verification¶
Check your work:
kubectl get pods -n autoscaling-lab
kubectl get hpa php-apache -n autoscaling-lab
kubectl describe hpa php-apache -n autoscaling-lab
kubectl top pods -n autoscaling-lab
Expected outcomes:
- Metrics Server is running and
kubectl topworks - HPA reports current CPU utilization
- Deployment scaled up under load
- Scale-down is slower than scale-up due to the stabilization window
Cleanup¶
Key takeaways¶
- Metrics Server is required for HPA — without it,
kubectl topand CPU-based autoscaling both fail - Resource requests must be set on Pods for HPA to calculate CPU utilization percentage
- HPA scales up quickly but uses a stabilization window to prevent flapping on scale-down
scaleDownbehavior policies give fine-grained control over how fast and how many Pods are removed at once- Custom and external metrics allow HPA to scale on application-level signals such as queue depth or request latency
Next section¶
Once you've reviewed the content and completed the lab, proceed to the next section.