Case Study: $240K/Year AWS Savings for a Healthcare SaaS

A healthcare SaaS company came to us with a simple problem: their AWS bill had grown from $12K/month to $38K/month in 18 months, but their user base had only doubled. Something was scaling linearly when it should have been sublinear.

Their VP of Engineering put it bluntly: "We have 50,000 users and we're spending $38K/month. Our competitor has 200,000 users and spends less than us. What are we doing wrong?"

After a 2-week audit and 6 weeks of implementation, we brought their bill down to $18,200/month — a 52% reduction, saving $237,600/year. Here's every single thing we changed.

The Audit

We started by categorizing spend by AWS service:

Service	Monthly Cost	% of Total
EC2 (EKS nodes)	$16,400	43%
RDS (PostgreSQL)	$7,200	19%
ElastiCache (Redis)	$3,100	8%
S3 + CloudFront	$2,800	7%
NAT Gateway	$2,600	7%
Data Transfer	$2,400	6%
EBS Volumes	$1,800	5%
Other	$1,700	5%
Total	$38,000	100%

Every single line item had optimization potential. Let's go through them.

1. EC2/EKS: Right-Size + Spot + Karpenter ($16,400 → $6,800)

This was the biggest win. Their EKS cluster was running on m5.2xlarge On-Demand instances because "that's what the AWS Quick Start guide suggested."

Changes:

Replaced Cluster Autoscaler with Karpenter
Added 15 instance types to the allowed list
Moved 75% of workloads to Spot instances
Right-sized every pod based on 2 weeks of VPA data
Added HPA to all stateless services

We wrote about the Karpenter setup in detail in our Karpenter + Spot + Scale-to-Zero post.

Savings: $9,600/month (58%)

2. RDS: Reserved Instances + Read Replicas ($7,200 → $3,600)

They were running a db.r6g.2xlarge PostgreSQL RDS instance — On-Demand, Multi-AZ. The database was at 15% CPU utilization on average.

Changes:

Downsized to db.r6g.xlarge (CPU was only hitting 40% during peak with the smaller instance)
Purchased a 1-year All Upfront Reserved Instance (42% discount)
Added a read replica for analytics queries that were hammering the primary
Moved nightly batch jobs to hit the replica instead of primary

-- Before: analytics queries on primary
SELECT date_trunc('day', created_at), count(*)
FROM patient_records 
WHERE created_at > now() - interval '90 days'
GROUP BY 1;

-- After: same query routed to read replica via connection string
-- analytics_db_url = postgres://replica-endpoint:5432/healthdb

Savings: $3,600/month (50%)

3. ElastiCache: Right-Size + Reserved ($3,100 → $1,400)

Running cache.r6g.xlarge with 3% memory utilization. They were caching session data for 50K users — that fits in a cache.r6g.large with room to spare.

Changes:

Downsized to cache.r6g.large
Purchased 1-year Reserved Instance
Implemented TTL on all cache keys (they had 2M keys with no expiry)

Savings: $1,700/month (55%)

4. NAT Gateway: The Silent Budget Killer ($2,600 → $800)

This one surprised everyone. NAT Gateway charges $0.045/GB for data processing — and their pods were pulling Docker images through NAT on every deploy.

Changes:

Configured ECR VPC endpoints (no more NAT for image pulls)
Added S3 VPC endpoint (logs and backups were going through NAT)
Configured STS and CloudWatch VPC endpoints
Moved non-essential traffic to instances with public IPs

# VPC Endpoints we added
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.us-east-1.ecr.api \
  --vpc-endpoint-type Interface

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.us-east-1.s3 \
  --vpc-endpoint-type Gateway

Savings: $1,800/month (69%)

NAT Gateway costs are one of the most overlooked line items in AWS bills. Every company we audit is overpaying for NAT.

5. S3 + CloudFront: Lifecycle Policies + Compression ($2,800 → $1,600)

They were storing every version of every file forever. Medical document uploads from 3 years ago were still in S3 Standard.

Changes:

S3 Intelligent-Tiering for all buckets (auto-moves cold data to cheaper tiers)
Lifecycle policy: move to Glacier after 1 year for compliance archives
Enabled CloudFront compression (Brotli) — reduced bandwidth 40%
Configured proper cache headers — CDN hit ratio went from 60% to 94%

{
  "Rules": [
    {
      "ID": "ArchiveOldDocuments",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 90,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 365,
          "StorageClass": "GLACIER"
        }
      ]
    }
  ]
}

Savings: $1,200/month (43%)

6. Data Transfer: Keep Traffic Inside the VPC ($2,400 → $1,200)

Cross-AZ data transfer charges were eating them alive. Services in us-east-1a were talking to services in us-east-1c, paying $0.01/GB each way.

Changes:

Configured topology-aware routing in Kubernetes (prefer same-AZ)
Moved chatty services into the same AZ
Compressed inter-service payloads (gRPC with protobuf instead of JSON)

# Topology-aware routing
apiVersion: v1
kind: Service
metadata:
  name: user-service
  annotations:
    service.kubernetes.io/topology-mode: Auto

Savings: $1,200/month (50%)

7. EBS Volumes: Delete Orphans + Change Types ($1,800 → $800)

22 unattached EBS volumes sitting there doing nothing. PersistentVolumes from deleted pods that nobody cleaned up. Classic.

Changes:

Deleted 22 orphaned EBS volumes (saved $400/month immediately)
Changed GP2 volumes to GP3 (20% cheaper, better performance)
Reduced snapshot frequency from hourly to daily for non-critical volumes

# Find orphaned volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Created:CreateTime}' \
  --output table

Savings: $1,000/month (56%)

The Final Scorecard

Service	Before	After	Savings	%
EC2/EKS	$16,400	$6,800	$9,600	58%
RDS	$7,200	$3,600	$3,600	50%
ElastiCache	$3,100	$1,400	$1,700	55%
NAT Gateway	$2,600	$800	$1,800	69%
S3/CloudFront	$2,800	$1,600	$1,200	43%
Data Transfer	$2,400	$1,200	$1,200	50%
EBS	$1,800	$800	$1,000	56%
Other	$1,700	$2,000	-$300	-18%
Total	$38,000	$18,200	$19,800	52%

"Other" went up slightly because we added monitoring tools (Kubecost, custom exporters) that have a small compute cost. Worth every penny.

Annual savings: $237,600

The entire project — audit, implementation, testing, documentation — took 8 weeks and cost them a fraction of one month's savings.

The Most Important Change

The technical optimizations were important, but the cultural change mattered more. We installed our FinOps dashboard on day one of the project, so the team could see costs in real-time from the start.

By week 3, engineers were coming to us with optimization ideas we hadn't thought of. One developer noticed their service was making 10x more S3 API calls than necessary due to a missing cache layer. Another found a cron job that was spinning up a large instance for 2 minutes every hour.

When you make costs visible, engineers optimize naturally. They just need the data.

AWS bill growing faster than your user base? That's normal — and fixable. Get a free infrastructure assessment and we'll show you exactly where the waste is.