Not long ago, I was pulled into an incident where a client was reporting serious performance degradation in their production environment. At first glance, everything seemed fine — healthy pods, normal resource usage, autoscaler working. And yet, the application was painfully slow.

After some digging, thinking, and a few cups of coffee, I believe I found the root cause. This is my breakdown of what happened — and how easily this could happen again in any Kubernetes setup.


The Context

  • AKS cluster with the Cluster Autoscaler enabled.
  • Each node: 8 vCPU, 32 GB RAM.
  • The cluster had 5 nodes at the time.
  • Multiple pods running.
  • Several deployments were missing resource requests and limits.

As you may know, the Cluster Autoscaler scales based on requests only — it doesn’t care about actual usage or limits. So if you don’t declare what your pods need, Kubernetes may end up making poor decisions.


A More Realistic Example

Let’s say we had these deployments in the cluster:

  • pod-unicorn: request = 1 CPU / 16 GB, limit = 2 CPU / 16 GB
  • pod-dragon: request = 0 CPU / 0 GB, limit = 6 CPU / 24 GB
  • pod-penguin: request = 0 CPU / 0 GB, limit = 4 CPU / 16 GB
  • pod-koala: request = 1 CPU / 4 GB, limit = 2 CPU / 4 GB

These pods ended up distributed across nodes in a way that looked fine on paper. For instance, one node ended up running:

  • pod-unicorn (reserved: 1 CPU, 16 GB)
  • pod-dragon (no reservations)
  • pod-koala (reserved: 1 CPU, 4 GB)

So the scheduler saw:

  • Reserved: 2 CPU, 20 GB
  • Available: 6 CPU, 12 GB
  • → All good, right?

But then pod-dragon starts using 20 GB of memory. Now we have a real problem:

  • The node only had 16 GB available after accounting for pod-unicorn.
  • pod-dragon gets throttled silently.
  • No alerts, no evictions, no autoscaling.
  • Meanwhile, the app becomes sluggish.

Even worse, the Azure metrics dashboard shows memory usage like this:

  • pod-unicorn: 4 GB → 25%
  • pod-dragon: 20 GB → 83%
  • Node memory usage: ~75%

Everything seems fine — but the pod is starving.


Why Wasn’t There an Eviction?

Eviction only happens under very specific conditions — actual node pressure like:

  • MemoryPressure
  • DiskPressure
  • PIDPressure

In this case:

  • The node still had some memory technically available.
  • pod-dragon wasn’t exceeding its limit.
  • The pod had no request, so the scheduler and autoscaler weren’t watching it.

Result: The pod was struggling without any red flags being raised.


What We Did to Fix It

During the incident, we took several steps:

  • Set proper requests across all deployments.
  • Adjusted replica counts where needed.
  • Increased the minimum node count in the autoscaler from 3 to 6.

This caused the autoscaler to spin up more nodes, redistributing pods and reducing contention. The performance immediately improved.

But truth be told, this was more a workaround than a fix — because as long as limits ≠ requests, we’re still at risk.


What We Learned

  1. Always define requests and limits.
    Without them, Kubernetes makes naive scheduling decisions.

  2. Be cautious when limits ≠ requests.
    For non-elastic workloads (no HPA), it’s safer to use limits = requests.

  3. Don’t trust default dashboards blindly.
    Percentages relative to limit may hide actual overcommitment issues.

  4. Eviction is not your friend here.
    Kubernetes will not save you from poor resource planning.


Final Thoughts

This incident was a reminder that Kubernetes is only as smart as you configure it to be. On the surface, the system looked healthy — but in reality, some pods were choking and dragging everything down.


Have you experienced something similar? I’d love to hear how you handled it. And if you haven’t — maybe go double-check your requests.