Karpenter + Spot Instances + Scale-to-Zero: How We Cut EKS Costs by 70%

Our client, a Series B fintech with 200+ microservices on EKS, was burning $47,000/month on Kubernetes compute alone. Their CTO's exact words: "We're spending more on infrastructure than on engineering salaries. Something is very wrong."

He wasn't wrong. After a 2-week audit, we found the usual suspects: overprovisioned pods, always-on dev/staging clusters, no spot instances, and Cluster Autoscaler fighting with node groups that didn't match workload patterns.

Here's exactly what we did to bring that $47K down to $14K — a 70.2% reduction — without a single production incident.

The Audit: Where the Money Was Going

First, we ran a full resource utilization analysis. The numbers were brutal:

Metric	Value
Average CPU utilization	12%
Average memory utilization	23%
Nodes running 24/7	38
Pods with no HPA	87%
Spot instance usage	0%
Dev/staging clusters uptime	24/7

In other words: they were paying for 8x more compute than they needed, and running dev environments around the clock for a team that works 9-to-6.

Phase 1: Karpenter Replaces Cluster Autoscaler

Cluster Autoscaler works, but it's slow and rigid. It needs pre-defined node groups, can't mix instance types efficiently, and takes 3-5 minutes to scale up. Karpenter is the opposite: it looks at pending pods, finds the cheapest instance type that fits, and provisions it in under 60 seconds.

Our Karpenter NodePool config:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m5.large
            - m5.xlarge
            - m5a.large
            - m5a.xlarge
            - m6i.large
            - m6i.xlarge
            - c5.large
            - c5.xlarge
            - c5a.large
            - r5.large
            - r5a.large
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]
      nodeClassRef:
        name: default
  limits:
    cpu: "200"
    memory: 400Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Key decisions:

Wide instance type selection. More instance types = higher Spot availability and lower prices. We listed 11 instance families instead of locking into one.
Multi-AZ. Spot capacity varies by AZ. Spreading across 3 zones means we almost never get interrupted.
Aggressive consolidation. consolidateAfter: 30s means Karpenter repacks pods onto fewer nodes as soon as utilization drops. No more half-empty nodes.

Karpenter vs Cluster Autoscaler Results

Metric	Before (CA)	After (Karpenter)
Scale-up time	3-5 min	30-60 sec
Node count (avg)	38	14
Instance type diversity	2 types	11 types
Bin packing efficiency	~30%	~78%

Phase 2: Spot Instances for 80% of Workloads

Here's the thing about Spot: most teams are afraid of it because they think workloads will get killed randomly. In practice, with the right setup, Spot interruption rates are under 5% for diversified instance pools.

We classified every workload into three tiers:

Tier 1: Spot-Ready (80% of pods)

Stateless services, background workers, batch jobs, anything with graceful shutdown handling.

# Added to every Spot-eligible deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 120
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-gateway

Tier 2: On-Demand with Spot Fallback (15%)

Stateful services, databases proxies, services with long-running connections.

Tier 3: On-Demand Only (5%)

Kafka brokers, Redis primaries, anything where a node interruption would cause data loss.

We used Karpenter's karpenter.sh/capacity-type node labels and pod affinity rules to route workloads to the right nodes:

# In the deployment spec for Tier 1 workloads
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 90
        preference:
          matchExpressions:
            - key: karpenter.sh/capacity-type
              operator: In
              values: ["spot"]

The preferredDuringScheduling means: try Spot first, but fall back to On-Demand if no Spot capacity is available. No pod ever stays unscheduled.

Spot Savings

Average Spot discount across our instance mix: 67% off On-Demand. Since 80% of workloads ran on Spot, the blended discount was roughly 54%.

Phase 3: Scale-to-Zero for Dev and Staging

This was the easiest win with the biggest impact. Dev and staging clusters were running 24/7 for a team that works 9 AM to 6 PM, Monday to Friday. That's 72% wasted compute.

We implemented KEDA (Kubernetes Event-Driven Autoscaling) with cron-based scaling:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-gateway-dev
  namespace: development
spec:
  scaleTargetRef:
    name: api-gateway
  minReplicaCount: 0
  maxReplicaCount: 2
  triggers:
    - type: cron
      metadata:
        timezone: America/Argentina/Buenos_Aires
        start: "0 9 * * 1-5"   # Scale up Mon-Fri 9 AM
        end: "0 19 * * 1-5"    # Scale down Mon-Fri 7 PM
        desiredReplicas: "2"

For staging, we added an HTTP trigger so it scales up on-demand when someone hits the endpoint:

triggers:
  - type: cron
    metadata:
      timezone: America/Argentina/Buenos_Aires
      start: "0 9 * * 1-5"
      end: "0 19 * * 1-5"
      desiredReplicas: "2"
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: http_requests_total
      query: sum(rate(http_requests_total{namespace="staging"}[5m]))
      threshold: "1"

When all pods in dev/staging scale to zero, Karpenter's consolidation kicks in and removes the nodes entirely. Zero pods = zero nodes = zero cost outside business hours.

Scale-to-Zero Impact

Environment	Before (24/7)	After (Scheduled)	Savings
Development	$8,200/mo	$2,300/mo	72%
Staging	$6,100/mo	$1,800/mo	70%

Phase 4: Right-Sizing Production Pods

The final piece: most pods were requesting 2-4x more resources than they used. We deployed Vertical Pod Autoscaler in recommendation mode for 2 weeks, then applied the suggestions:

# Before
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

# After (based on VPA recommendations)
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "300m"
    memory: "384Mi"

This alone reduced the number of nodes needed by 40%, because Karpenter provisions smaller (cheaper) instances when pods request less.

The Final Numbers

Category	Before	After	Savings
Production compute	$32,700	$10,200	69%
Development	$8,200	$2,300	72%
Staging	$6,100	$1,800	70%
Total	$47,000	$14,300	70.2%

Monthly savings: $32,700. Annual savings: $392,400.

The implementation took 3 weeks. ROI: approximately 6 hours.

What We'd Do Differently

Start with right-sizing before Spot. We did Karpenter first, but if we'd right-sized pods first, Karpenter would have been even more efficient from day one.

Use Savings Plans for the On-Demand baseline. The 20% of workloads that must run On-Demand should be covered by 1-year Compute Savings Plans for an additional 30% discount. We're implementing this now.

Set up cost alerts earlier. We built the FinOps dashboard after the optimization. Should have done it first so the team could see the impact in real-time.

Spending too much on Kubernetes? We've done this optimization for 8 teams now. Book a free infrastructure assessment — we'll tell you exactly where the waste is.