Our client, a Series B fintech with 200+ microservices on EKS, was burning $47,000/month on Kubernetes compute alone. Their CTO's exact words: "We're spending more on infrastructure than on engineering salaries. Something is very wrong."
He wasn't wrong. After a 2-week audit, we found the usual suspects: overprovisioned pods, always-on dev/staging clusters, no spot instances, and Cluster Autoscaler fighting with node groups that didn't match workload patterns.
Here's exactly what we did to bring that $47K down to $14K — a 70.2% reduction — without a single production incident.
The Audit: Where the Money Was Going
First, we ran a full resource utilization analysis. The numbers were brutal:
| Metric | Value |
|---|---|
| Average CPU utilization | 12% |
| Average memory utilization | 23% |
| Nodes running 24/7 | 38 |
| Pods with no HPA | 87% |
| Spot instance usage | 0% |
| Dev/staging clusters uptime | 24/7 |
In other words: they were paying for 8x more compute than they needed, and running dev environments around the clock for a team that works 9-to-6.
Phase 1: Karpenter Replaces Cluster Autoscaler
Cluster Autoscaler works, but it's slow and rigid. It needs pre-defined node groups, can't mix instance types efficiently, and takes 3-5 minutes to scale up. Karpenter is the opposite: it looks at pending pods, finds the cheapest instance type that fits, and provisions it in under 60 seconds.
Our Karpenter NodePool config:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- m5.xlarge
- m5a.large
- m5a.xlarge
- m6i.large
- m6i.xlarge
- c5.large
- c5.xlarge
- c5a.large
- r5.large
- r5a.large
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
nodeClassRef:
name: default
limits:
cpu: "200"
memory: 400Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s
Key decisions:
- Wide instance type selection. More instance types = higher Spot availability and lower prices. We listed 11 instance families instead of locking into one.
- Multi-AZ. Spot capacity varies by AZ. Spreading across 3 zones means we almost never get interrupted.
- Aggressive consolidation.
consolidateAfter: 30smeans Karpenter repacks pods onto fewer nodes as soon as utilization drops. No more half-empty nodes.
Karpenter vs Cluster Autoscaler Results
| Metric | Before (CA) | After (Karpenter) |
|---|---|---|
| Scale-up time | 3-5 min | 30-60 sec |
| Node count (avg) | 38 | 14 |
| Instance type diversity | 2 types | 11 types |
| Bin packing efficiency | ~30% | ~78% |
Phase 2: Spot Instances for 80% of Workloads
Here's the thing about Spot: most teams are afraid of it because they think workloads will get killed randomly. In practice, with the right setup, Spot interruption rates are under 5% for diversified instance pools.
We classified every workload into three tiers:
Tier 1: Spot-Ready (80% of pods)
Stateless services, background workers, batch jobs, anything with graceful shutdown handling.
# Added to every Spot-eligible deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 120
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-gateway
Tier 2: On-Demand with Spot Fallback (15%)
Stateful services, databases proxies, services with long-running connections.
Tier 3: On-Demand Only (5%)
Kafka brokers, Redis primaries, anything where a node interruption would cause data loss.
We used Karpenter's karpenter.sh/capacity-type node labels and pod affinity rules to route workloads to the right nodes:
# In the deployment spec for Tier 1 workloads
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 90
preference:
matchExpressions:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
The preferredDuringScheduling means: try Spot first, but fall back to On-Demand if no Spot capacity is available. No pod ever stays unscheduled.
Spot Savings
Average Spot discount across our instance mix: 67% off On-Demand. Since 80% of workloads ran on Spot, the blended discount was roughly 54%.
Phase 3: Scale-to-Zero for Dev and Staging
This was the easiest win with the biggest impact. Dev and staging clusters were running 24/7 for a team that works 9 AM to 6 PM, Monday to Friday. That's 72% wasted compute.
We implemented KEDA (Kubernetes Event-Driven Autoscaling) with cron-based scaling:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-gateway-dev
namespace: development
spec:
scaleTargetRef:
name: api-gateway
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: cron
metadata:
timezone: America/Argentina/Buenos_Aires
start: "0 9 * * 1-5" # Scale up Mon-Fri 9 AM
end: "0 19 * * 1-5" # Scale down Mon-Fri 7 PM
desiredReplicas: "2"
For staging, we added an HTTP trigger so it scales up on-demand when someone hits the endpoint:
triggers:
- type: cron
metadata:
timezone: America/Argentina/Buenos_Aires
start: "0 9 * * 1-5"
end: "0 19 * * 1-5"
desiredReplicas: "2"
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: sum(rate(http_requests_total{namespace="staging"}[5m]))
threshold: "1"
When all pods in dev/staging scale to zero, Karpenter's consolidation kicks in and removes the nodes entirely. Zero pods = zero nodes = zero cost outside business hours.
Scale-to-Zero Impact
| Environment | Before (24/7) | After (Scheduled) | Savings |
|---|---|---|---|
| Development | $8,200/mo | $2,300/mo | 72% |
| Staging | $6,100/mo | $1,800/mo | 70% |
Phase 4: Right-Sizing Production Pods
The final piece: most pods were requesting 2-4x more resources than they used. We deployed Vertical Pod Autoscaler in recommendation mode for 2 weeks, then applied the suggestions:
# Before
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
# After (based on VPA recommendations)
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "300m"
memory: "384Mi"
This alone reduced the number of nodes needed by 40%, because Karpenter provisions smaller (cheaper) instances when pods request less.
The Final Numbers
| Category | Before | After | Savings |
|---|---|---|---|
| Production compute | $32,700 | $10,200 | 69% |
| Development | $8,200 | $2,300 | 72% |
| Staging | $6,100 | $1,800 | 70% |
| Total | $47,000 | $14,300 | 70.2% |
Monthly savings: $32,700. Annual savings: $392,400.
The implementation took 3 weeks. ROI: approximately 6 hours.
What We'd Do Differently
Start with right-sizing before Spot. We did Karpenter first, but if we'd right-sized pods first, Karpenter would have been even more efficient from day one.
Use Savings Plans for the On-Demand baseline. The 20% of workloads that must run On-Demand should be covered by 1-year Compute Savings Plans for an additional 30% discount. We're implementing this now.
Set up cost alerts earlier. We built the FinOps dashboard after the optimization. Should have done it first so the team could see the impact in real-time.
Spending too much on Kubernetes? We've done this optimization for 8 teams now. Book a free infrastructure assessment — we'll tell you exactly where the waste is.