Not long ago, I was pulled into an incident where a client was reporting serious performance degradation in their production environment. At first glance, everything seemed fine — healthy pods, normal resource usage, autoscaler working. And yet, the application was painfully slow.
After some digging, thinking, and a few cups of coffee, I believe I found the root cause. This is my breakdown of what happened — and how easily this could happen again in any Kubernetes setup.
The Context
- AKS cluster with the Cluster Autoscaler enabled.
- Each node: 8 vCPU, 32 GB RAM.
- The cluster had 5 nodes at the time.
- Multiple pods running.
- Several deployments were missing resource
requestsandlimits.
As you may know, the Cluster Autoscaler scales based on requests only — it doesn’t care about actual usage or limits. So if you don’t declare what your pods need, Kubernetes may end up making poor decisions.
A More Realistic Example
Let’s say we had these deployments in the cluster:
pod-unicorn: request = 1 CPU / 16 GB, limit = 2 CPU / 16 GBpod-dragon: request = 0 CPU / 0 GB, limit = 6 CPU / 24 GBpod-penguin: request = 0 CPU / 0 GB, limit = 4 CPU / 16 GBpod-koala: request = 1 CPU / 4 GB, limit = 2 CPU / 4 GB
These pods ended up distributed across nodes in a way that looked fine on paper. For instance, one node ended up running:
pod-unicorn(reserved: 1 CPU, 16 GB)pod-dragon(no reservations)pod-koala(reserved: 1 CPU, 4 GB)
So the scheduler saw:
- Reserved: 2 CPU, 20 GB
- Available: 6 CPU, 12 GB
- → All good, right?
But then pod-dragon starts using 20 GB of memory. Now we have a real problem:
- The node only had 16 GB available after accounting for
pod-unicorn. pod-dragongets throttled silently.- No alerts, no evictions, no autoscaling.
- Meanwhile, the app becomes sluggish.
Even worse, the Azure metrics dashboard shows memory usage like this:
pod-unicorn: 4 GB → 25%pod-dragon: 20 GB → 83%- Node memory usage: ~75%
Everything seems fine — but the pod is starving.
Why Wasn’t There an Eviction?
Eviction only happens under very specific conditions — actual node pressure like:
MemoryPressureDiskPressurePIDPressure
In this case:
- The node still had some memory technically available.
pod-dragonwasn’t exceeding itslimit.- The pod had no
request, so the scheduler and autoscaler weren’t watching it.
Result: The pod was struggling without any red flags being raised.
What We Did to Fix It
During the incident, we took several steps:
- Set proper
requestsacross all deployments. - Adjusted replica counts where needed.
- Increased the minimum node count in the autoscaler from 3 to 6.
This caused the autoscaler to spin up more nodes, redistributing pods and reducing contention. The performance immediately improved.
But truth be told, this was more a workaround than a fix — because as long as limits ≠ requests, we’re still at risk.
What We Learned
-
Always define
requestsandlimits.
Without them, Kubernetes makes naive scheduling decisions. -
Be cautious when
limits ≠ requests.
For non-elastic workloads (no HPA), it’s safer to uselimits = requests. -
Don’t trust default dashboards blindly.
Percentages relative tolimitmay hide actual overcommitment issues. -
Eviction is not your friend here.
Kubernetes will not save you from poor resource planning.
Final Thoughts
This incident was a reminder that Kubernetes is only as smart as you configure it to be. On the surface, the system looked healthy — but in reality, some pods were choking and dragging everything down.
Have you experienced something similar? I’d love to hear how you handled it. And if you haven’t — maybe go double-check your requests.