What is the biggest lever in Kubernetes cost optimization?

Right-sizing pod resource requests. The scheduler reserves nodes based on requests, not actual usage, so over-requested CPU and memory strand node capacity you still pay for. Compare requests to real usage (kubectl top, or the Vertical Pod Autoscaler in recommendation mode) and lower requests to fit reality — this is what lets the cluster pack more pods onto fewer nodes.

Should I use Spot instances for Kubernetes nodes?

Yes for fault-tolerant, stateless or batch workloads — Spot (AKS Spot node pools, EKS Spot via Karpenter or managed node groups) is typically 60-90% cheaper than on-demand. Keep critical/stateful workloads on a small on-demand pool using taints, tolerations and node selectors, and run everything interruptible on Spot.

Do reservations or Savings Plans cover Kubernetes?

They cover the underlying compute. On AKS the worker nodes are Azure VMs, so Reserved VM Instances and Azure savings plans for compute apply. On EKS the nodes are EC2, covered by Compute Savings Plans or EC2 Reserved Instances, and EKS Fargate is covered by Compute Savings Plans. Right-size first, then commit to the steady-state baseline.

Kubernetes · Cost optimization · Updated June 2026

Kubernetes Cost Optimization: AKS & EKS Without the Waste

By the CloudFinOpsKit team. Applies to Azure Kubernetes Service and Amazon EKS. 11 min read.

Kubernetes is where cloud cost goes to hide. You don't pay for pods — you pay for nodes (VMs on AKS, EC2 on EKS), and the gap between what your pods request and what they actually use is pure waste you're billed for. Most clusters run at 30–40% real CPU utilization while paying for 100% of the nodes. Here's how to close that gap, in priority order.

First: see requests vs. actual usage

The scheduler packs nodes by resource requests, not real consumption. So the first job is to compare the two:

# actual usage right now (needs metrics-server)
kubectl top pods -A --sum=true
kubectl top nodes

# what each pod RESERVED (requests) — the number you're billed against
kubectl get pods -A -o custom-columns=\
NS:.metadata.namespace,POD:.metadata.name,\
CPU_REQ:.spec.containers[*].resources.requests.cpu,\
MEM_REQ:.spec.containers[*].resources.requests.memory

For allocation by namespace/team, add an open-source cost tool — OpenCost (the CNCF standard) or Kubecost — which maps node cost onto pods by their requests. That turns "the cluster costs $X" into "team A's namespace costs $Y", which is the start of showback.

Lever 1 — Right-size pod requests (the biggest win)

Over-requested CPU and memory strand node capacity nobody uses. Lower requests to match reality (with headroom for spikes). The Vertical Pod Autoscaler in recommendation mode (Off updateMode) will suggest right-sized requests without changing anything:

kubectl describe vpa <name>   # shows Target / Lower / Upper recommendations

Set requests near the VPA "target" (or your p90 usage), keep limits sane to avoid noisy-neighbour issues, and you'll immediately fit more pods per node. This one change routinely reclaims 20–40% of node spend.

Lever 2 — Bin-pack with the right autoscaler

Once requests are honest, let the cluster shrink to fit:

Horizontal Pod Autoscaler scales replicas to demand so you're not running peak capacity 24/7.
Cluster Autoscaler removes underused nodes. On AKS, enable the cluster autoscaler (and consider Node Autoprovisioning); on EKS, Karpenter is the modern choice — it provisions just-right instance types on demand and consolidates workloads onto fewer, cheaper nodes automatically.

Lever 3 — Spot for anything interruptible

Stateless, batch, CI, and dev workloads should run on Spot — typically 60–90% off on-demand. Use AKS Spot node pools or EKS Spot (via Karpenter or managed node groups), and keep critical/stateful pods on a small on-demand pool using taints, tolerations and node selectors so only interruptible work lands on Spot.

Lever 4 — Scale non-prod to zero off-hours

Dev/test clusters running nights and weekends are ~65% wasted time. Scale node pools to zero on a schedule, or use KEDA to scale workloads (and therefore nodes) to zero when idle. A dev cluster that sleeps 7pm–7am and weekends costs roughly a third of a 24/7 one.

Lever 5 — Cover the baseline with commitments

After right-sizing, your cluster has a steady-state floor of nodes that runs all the time — commit to that, not your peak. AKS nodes are Azure VMs (Reserved VM Instances / Azure savings plans for compute); EKS nodes are EC2 (Compute Savings Plans or RIs), and EKS Fargate is covered by Compute Savings Plans. Right-size first so you don't commit to waste.

Don't forget the leftovers

Kubernetes sheds orphaned cloud resources: unattached persistent-volume disks from deleted PVCs, idle LoadBalancer services (each spins up a billed cloud load balancer + public IP), old snapshots, and abandoned dev clusters. These don't show up in kubectl — they show up on the cloud bill. Sweep them with your normal cloud cost review.

See your cluster's node waste automatically. The CloudFinOpsKit tool checks AKS node pools and EKS node groups for right-sizing against real utilization, flags the orphaned disks and idle load balancers that clusters leave behind, and shows whether your steady-state nodes are covered by commitments — read-only, priced from your actual bill, for Azure and AWS.

FAQ

Is AKS or EKS cheaper?

The control plane pricing differs (AKS's standard tier and EKS both charge a small per-cluster hourly fee), but that's noise — the real cost is the worker nodes, and it's driven by how well you right-size and pack them, not by the provider. Optimization technique matters far more than the AKS-vs-EKS choice.

How do I allocate Kubernetes cost back to teams?

Use namespaces/labels per team and an allocation tool (OpenCost/Kubecost) that splits node cost by pod requests, then roll it into your wider cost-allocation statement. AI/ML workloads on GPU nodes especially need this.

What about GPU nodes?

GPU nodes are the most expensive thing in most clusters — keep them on their own pool, scale them to zero when no GPU jobs are queued, and never let general workloads schedule onto them (taint them).