How to Tune Custom HPA and KEDA Scalers

author image

Sam Farid

CTO/Founder

2025-07-31T17:00:00.575Z
An image showing the lifecycle of how KEDA interacts with RMQ

Kubernetes autoscalers can react to custom metrics such as queue depth or request rate, but when you step off the beaten path of CPU / memory, you need to understand exactly how the scaler translates those numbers into replicas and how to tune the targets for your workload.

How HPA turns a metric into replicas

Take this config snippet from the KEDA QueueLength docs:

triggers:
- type: rabbitmq
  metadata:
    mode: QueueLength
    host: amqp://localhost:5672/vhost
    queueName: testqueue
    protocol: auto
    value: "100.5"
    activationValue: "10.5"
  1. The metric adapter (e.g. KEDA scaler or Prometheus adapter) polls the source at some interval (default 30s) and publishes the current raw value to 'external.metrics.k8s.io'.
  2. Then the HPA controller wakes up every 15s to retrieve that value, divides it by the target value per pod (100.5 in this case), and rounds up:

    desiredReplicas = ceil(currentMetric / target)
    

The docs aren't very explicit about this behavior, so just to be very clear:

The configured target value is how many messages each pod is expected to handle.

Or rephrased, one pod should keep the metric at or below this number. So in this example, a queue length of 1000 pending messages means the HPA will scale up to 11 pods. If this contract is broken (i.e. the value is incorrectly tuned), the performance will suffer.

How to tune a custom metric value

Before touching any numbers, create a dashboard with these live signals:

Category Metric Why it matters
Scaler metric queue length, lag seconds, RPS, etc Direct input to HPA
Work in flight backlog age or growth rate Shows whether users are waiting
User latency p95 / p99 end to end Ultimate SLO (non‑negotiable)
Resource headroom pod CPU %, memory %, throttling Ensures a pod can actually do more work if you raise the target
Churn replica count and scaling events/min High volatility means wasted nodes and cold‑start latency

Then look for patterns. Some simple examples:

  • Backlog is increasing but CPU utilization is low -> target value is too low
  • CPU utilization is too high -> target value is too high or CPU request is too small
  • Pod counts are flapping with small traffic changes -> look at your 'stabilizationWindow' or 'activationThreshold' (more about these behavior fields)

This is an iterative process. It's a best practice to make incremental (<=30%) changes and only update one field (value, CPU request, stabilization) at a time to distinguish between what causes behavior changes.

Some specific examples

Scaler (metric) Starter Target Symptom When Wrong Possible fixes
QueueLength on RabbitMQ/Redis 10 msgs Backlog > 1000, CPU <= 25% Lower to 5 or keep 10 and raise 'maxReplicas', and verify 'channel.prefetch' isn’t bottlenecking
LagSeconds for Kafka 60s 'consumerlagseconds' > 300; CPU > 80% Halve target, shorten polling interval to 10s, tune 'fetch.max.bytes'
RequestsPerSecond from an API 200 rps p95 latency > 700 ms while CPU ~= 70% Drop to 150 rps, add joint CPU scaler at 75%
DB Connection Count 50 conns “Too many connections” errors despite low CPU Reduce to 30 conns, bump DB pool size, use 'cooldownPeriod' to stagger ramps

Ongoing efforts

Traffic patterns, features and feature flags, and underlying infrastructure all shift often. If you do find the right balance of configs, make sure to document the tuning process in a playbook so that when your initial assumptions aren't valid, you can easily revisit and iterate again.

Dialed‑in scalers are the quiet guardians of a healthy platform - invisible when traffic is calm, but scales quickly during traffic spikes and saves money otherwise.

If this sounds a bit tedious, consider having Flightcrew manage your autoscaling configs for you and automatically send PRs to update scalers and custom values to timely and optimal values. Shoot us a note at hello@flightcrew.io if you'd like to learn more.

author image

Sam Farid

CTO/Founder

Before founding Flightcrew, Sam was a tech lead at Google, ensuring the integrity of YouTube viewcount and then advancing network throughput and isolation at Google Cloud Serverless. A Dartmouth College graduate, he began his career at Index (acquired by Stripe), where he wrote foundational infrastructure code that still powers Stripe servers. Find him on Bluesky or holosam.dev.

keep-reading-vector
Subscription decoration

Don’t miss out!

Sign up for our newsletter and stay connected