How Resources Overcommit in Kubernetes caused an incident
Not long ago, I was pulled into an incident where a client was reporting serious performance degradation in their production environment. At first glance, everything seemed fine — healthy pods, normal resource usage, autoscaler working. And yet, the application was painfully slow. After some digging, thinking, and a few cups of coffee, I believe I found the root cause. This is my breakdown of what happened — and how easily this could happen again in any Kubernetes setup. ...