Taming Kubernetes HPA Flapping with Stabilization Windows

Kubernetes’ Horizontal Pod Autoscaler (HPA) is the built‑in mechanism for adjusting the replica count to match traffic demand. Out of the box it'll work pretty well, but a poorly-tuned HPA can be less stable and more expensive than even just pinning replicas at peak capacity. One of the most powerful yet forgotten-about levers you can adjust is the stabilization window.

Understanding the Stabilization Window

What it is

A stabilization window is the period during which HPA records every desired replica recommendation it produces. When it is time to act, the HPA doesn't just look at the most recent metric sample, it chooses the highest replica count recommendation that appeared inside the two (independent) scale-up and scale-down windows.

This approach guarantees you’ll never scale below any peak that occurred inside the window, while smoothing out short bursts and dips. By default:

Window	Default	Reason
Scale‑up	0s	Burst traffic should be met as quickly as possible.
Scale‑down	300s	Five minutes gives pods time to start, serve traffic, and report metrics before they are considered for removal.

Additionally, HPA reconciles every 15 seconds by default. It's best to set stabilization windows to a multiples of this interval so that each controller loop fits cleanly inside the window.

You can shorten this sync period (controlled by the '--horizontal-pod-autoscaler-sync-period' flag on the 'kube-controller-manager'), but remember that each loop calls the metrics API and updates the control plane. So halving the interval doubles that load, and on busy clusters, the extra churn can outweigh the benefit of faster scaling.

Why it's important

Without a stabilization window, a three‑second traffic spike can make HPA scale from three replicas to ten and then back again 30 seconds later. This flapping causes real pain:

Unmet demand: new pods may still be downloading images or warming caches when HPA starts deleting them, so users see 5xx errors or high latency.
Wasted spend: CPU and memory are allocated to pods that never handle sustained load.
Noisy signals: the sudden replica churn distorts CPU, latency, and queue‑length metrics, causing HPA to chase noise instead of real trends. That feedback loop can amplify the instability.

A well‑chosen window lets you react quickly enough while avoiding these costs.

Examples of Stabilization Effects

Take for example a workload that warms a cache on startup, manifesting as a 60-120 second spike in CPU and memory usage.

1. Scale‑down window too short

behavior:
  scaleDown:
    stabilizationWindowSeconds: 30

12:00:00 – traffic spike causes HPA to scale 3 → 10
12:00:15 – traffic returns to baseline
12:00:45 – HPA scales 10 → 3 once the 30‑second window expires
12:01:00 – the new pods finally become Ready, only to be terminated immediately

Result: users experienced elevated latency and some dropped requests because the extra replicas never served traffic. The cluster still paid for the CPU and memory consumed during start‑up.

2. Window tuned to start‑up time

behavior:
  scaleDown:
    stabilizationWindowSeconds: 90

12:00:00 – traffic spike causes scaling 3 → 10
12:00:15 – traffic returns to baseline
12:01:30 – window expires, scaling 10 → 3

Result: additional replicas were alive long enough to serve traffic once their caches were warm. Latency stayed within SLO and the cost increase was limited to about one minute of extra capacity.

3. Window excessively long

behavior:
  scaleDown:
    stabilizationWindowSeconds: 900

Result: every spike locks in 15 minutes of elevated replica count. Frequent spikes can double compute spend without improving reliability.

When to tune the HPA stabilization window

Symptom	What to watch
Replica oscillations such as 5 → 8 → 4 → 9	'kubehpastatusdesiredreplicas' and 'kubehpastatuscurrentreplicas'
Scale decisions that lag or lead latency	Application SLO dashboards
High pod churn	'kubepodstatus_phase{phase="Pending"}' and container start counters

Stabilization windows are a one‑line YAML change that can eliminate most HPA‑induced flapping. Start by measuring your container start‑up time, set the scale‑down window to at least that value, and add a small (15–30 second) scale‑up window if you see overshoot during short spikes.

Platform or SRE engineers typically own these metrics and can balance reliability against cost. Application teams can still experiment safely if you expose HPA settings through templates and guardrails.

Final thoughts

The HPA stabilization window helps Kubernetes applications avoid instability and inefficiency caused by flapping autoscaling behavior.

If HPA still cannot cope with your traffic patterns, consider event‑driven autoscaling with KEDA or see when to switch from HPA to KEDA. NOTE: If you do stack KEDA on top of HPA, be aware of KEDA's cooldownPeriod. It holds the desired replica count before it reaches HPA, and then the HPA's own stabilization window applies, so the longer of the two delays wins and pods only terminate after both periods have expired.

Or, consider having Flightcrew manage your autoscaling configs for you and automatically send PRs to update stabilization windows to the ideal values. Shoot us a note at hello@flightcrew.io if you're interested in trying it out.