Scaling Kubernetes with KEDA and Prometheus

TL;DR - Setting KEDA to scale on Prometheus metrics lets you apply powerful, flexible autoscaling rules to any metric you’re already collecting.Here’s a quick guide!

KEDA 101

Kubernetes Event Driven Autoscaling extends Horizontal Pod Autoscaling to act on custom events and metrics. Specifically KEDA opens up

Event Driven Scaling based on reactive or deterministic triggers (ex: batch jobs or push notifications)
Scaling on custom metrics (ex: Queuesize) that are more accurate and responsive measure of load
Enables Scale-to-Zero Behavior: Pods can completely scale down when no load is present.

Installing KEDA can improve the accuracy and responsiveness of kubernetes autoscaling, and means more types of workloads can enjoy the benefits of horizontal autoscaling. Read more here

Scaling KEDA with Prometheus

Prometheus is the de facto open-source toolkit for monitoring Kubernetes-based workloads. By pairing Prometheus with KEDA’s Prometheus scaler, you can seamlessly leverage any observed metric—even custom business metrics—as the trigger for scaling.

Diagram - Prometheus

For example you can:

Scaling On HTTP Traffic: When http_requests_total goes above a threshold, spin up more Pods.
Scaling On Custom SLIs: E.g., “orders_unprocessed” that you track via a custom Prometheus metric.
Advanced Metrics: Scale based on CPU or memory usage in a user-friendly, standardized way.

Pros:

Unified Observability: Piggyback on existing metrics from prometheus .
Extremely Flexible: Virtually any PromQL expression can drive autoscaling.
Business-Centric Metrics: Scale on real business events (ex: num_orders) as the true measure of load, instead of simple utilization metrics.

Cons:

Query Performance: Overly complex or frequent queries can stress Prometheus.
Metric Hygiene Needed: This applies to prometheus as a whole, but Inconsistent labeling or low-quality metrics can lead cause direct and downstream issues.

Step-by-Step Guide to Scaling KEDA on Prometheus

1. Install Prometheus

Deploy Prometheus using Helm

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus

2. Install KEDA

Deploy KEDA to your Kubernetes cluster using Helm:

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

3. Create a Scaled Object Trigger

Define a ScaledObject to scale based on a Prometheus query: For example, if Prometheus metrics indicate no HTTP traffic (no requests in the past 5 minutes), KEDA can scale your application down to zero pods until traffic resumes.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: payments
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.default.svc.cluster.local
      metricName: rate(http_requests[5m])
      threshold: "100"

4. Apply the Configuration Save the ScaledObject definition to a file, e.g., prometheus-scaledobject.yaml, and apply it:

kubectl apply -f prometheus-scaledobject.yaml

5. Test the Autoscaling Generate traffic to your app and observe the scaling behavior:

kubectl get hpa -w

You should see the Horizontal Pod Autoscaler (HPA) dynamically adjust the number of pods based on HttpRequests

Scaling to Zero on Prometheus

Another example of scaling to zero would be to use relevant business metrics stored in Prometheus. For example, if Prometheus metrics indicates that there are no orders for this workload (e.g., orders_pending is below a threshold), KEDA can scale your application down to zero pods until traffic resumes.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaledobject
  namespace: default
spec:
  scaleTargetRef:
    name: payments
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.default.svc.cluster.local
      metricName: orders_pending
      threshold: "10"

Best Practices & Edge Cases

This applies to Prometheus as a whole but you should probably ...

Subqueries & Summaries: Be selective with your PromQL to avoid “noisy” signals or performance overhead.
Don't worry about query frequency: Prometheus can handle it and you want to get the most out of KEDA
Verify Metric Consistency: Use consistent labels to ensure you’re querying the right timeseries.
Prevent Churn: Avoid flapping by introducing a cooldown or smoothing your query metrics.

Integrating KEDA with Prometheus opens up a full universe of scaling options - this can be overwhelming! Check out our tutorial on matching the right scaling strategies to your workloads.

Prometheus, KEDA and Flightcrew

Combining Prometheus with KEDA is a natural integration that extends horizontal autoscaling across your observability stack.

Once you’ve set up KEDA to scale on Prometheus you’ll need to tune your KEDA config so that it aligns with your pod resources, and your underlying node lifecycle.

Flightcrew is an AI tool that can help with this, and other production engineering tasks. Let us know if we can help.

Scaling Kubernetes with KEDA and Prometheus

KEDA 101

Scaling KEDA with Prometheus

Step-by-Step Guide to Scaling KEDA on Prometheus

Scaling to Zero on Prometheus

Best Practices & Edge Cases

Prometheus, KEDA and Flightcrew

Keep reading

When to Migrate from HPA to KEDA

Use KEDA Scaling Modifiers to Manage AI Infrastructure

Don’t miss out!