Problem#

On an on-prem K8s cluster (mini-might worker node), a Java app ran out of memory, causing the OOM Killer to fire and bring down the node. This post covers the analysis and resolution.


Symptoms#

  • All pods on the mini-might node showed Unknown status in k9s
  • Pods recovered sequentially after a reboot
  • kubectl describe node mini-might Events showed a Rebooted entry
NodeNotReady    5m ago    - node went down
Rebooted        58s ago   - reboot detected
NodeReady       17s ago   - recovery complete

Root Cause Analysis#

1. Checking OOM logs in dmesg#

After SSHing into the mini-might node:

sudo grep -i "oom\|killed\|out of memory" /var/log/kern.log | tail -50

OOM log found:

2026-03-22T15:00:03 mini-might kernel: oom-kill:constraint=CONSTRAINT_MEMCG
task=java,pid=90057,uid=0
Memory cgroup out of memory: Killed process 90057 (java)
total-vm:2007864kB, anon-rss:518828kB

2. Analysis#

ItemValue
Processjava (PID 90057)
OOM typeCONSTRAINT_MEMCG (cgroup memory limit exceeded)
Memory usage~507MB (anon-rss: 518828kB)
Virtual memory~2GB (total-vm: 2007864kB)

3. Checking the problematic pod configuration#

kubectl describe pod java-cloud-gateway-6558b496dd-nhhfw -n dev-webs

java-service container resources:

resources:
  limits:
    cpu: 500m
    memory: 512Mi   # <- the problem!
  requests:
    cpu: 100m
    memory: 128Mi

JVM options:

env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"

4. Root cause#

SettingValueDescription
Memory Limit512MiMaximum container memory
MaxRAMPercentage75%JVM heap ratio
Calculated heap~384Mi512 × 0.75
Actual usage~507MBAt time of OOM

The issue:

  • Java uses memory beyond just the heap (384Mi): Metaspace, Thread Stack, Native Memory
  • Heap + non-heap memory exceeded the 512Mi limit → OOM Killer triggered

Fix#

resources:
  limits:
    cpu: 500m
    memory: 1Gi     # 512Mi → 1Gi
  requests:
    cpu: 100m
    memory: 512Mi   # 128Mi → 512Mi

Option 2: Lower the JVM heap ratio#

env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=50.0"

Option 3: Set heap size explicitly#

env:
  - name: JAVA_OPTS
    value: "-XX:+UseContainerSupport -Xmx256m -Xms256m"

Node Resource Status#

mini-might node state (after recovery)#

ItemValueStatus
Memory7.4GB / 28GB usedOK
Disk55GB / 98GB used (59%)OK
CPU Limits118% overcommitWarning
Memory Limits40% (11GB/28GB)OK
SwapDisabled (0B)Warning

Notes:

  • With swap disabled, memory exhaustion immediately triggers OOM
  • CPU overcommit (118%) can cause resource contention under burst conditions

Debugging Commands#

Check node status#

kubectl get nodes
kubectl describe node <node-name>

Check OOM logs (directly on node)#

# File logs persist across reboots
sudo grep -i "oom\|killed\|out of memory" /var/log/kern.log | tail -50
sudo grep -i "oom\|killed" /var/log/syslog | tail -50

# dmesg is cleared on reboot
sudo dmesg | grep -i "oom\|killed\|out of memory"

Check kubelet logs#

sudo journalctl -u kubelet --since "1 hour ago" | tail -100

Check memory/disk status#

free -h
df -h

Check pod resource configuration#

kubectl describe pod <pod-name> -n <namespace> | grep -A10 -iE "limits:|requests:|env:"

Java Container Memory Guide#

JVM Memory Layout#

Total Container Memory
├── Heap
│   └── Controlled by -Xmx or MaxRAMPercentage
├── Metaspace
│   └── Class metadata, unlimited by default
├── Thread Stacks
│   └── ~1MB per thread
├── Native Memory
│   └── JNI, native libraries
└── Direct Buffers
    └── NIO buffers
Container LimitMaxRAMPercentageCalculated HeapNon-heap headroom
512Mi50%256Mi256Mi
1Gi75%768Mi256Mi
2Gi75%1.5Gi512Mi

Safe JVM Options Example#

env:
  - name: JAVA_OPTS
    value: >-
      -XX:+UseContainerSupport
      -XX:MaxRAMPercentage=75.0
      -XX:MaxMetaspaceSize=128m
      -XX:+ExitOnOutOfMemoryError

-XX:+ExitOnOutOfMemoryError: Causes the JVM to exit immediately on OOM, allowing K8s to restart the pod.


Lessons Learned#

Why Java is particularly vulnerable to OOM Kills#

Java runs on a JVM (virtual machine), so memory is managed at two levels:

Container memory limit (e.g., 512Mi)
├── JVM Heap (-Xmx)        ← managed by GC
├── JVM Non-Heap            ← Metaspace, CodeCache, Thread Stack, etc.
├── Native Memory           ← JNI, NIO DirectBuffer, etc.
└── OS/overhead             ← JVM process overhead
  • The JVM only limits the heap — everything else is unconstrained. Even with -Xmx256m, non-heap + native memory can consume an additional 100–200MB.
  • GC only runs within the heap. Even if total container memory is near the limit, the JVM has no visibility into that and won’t trigger GC.
  • When container memory exceeds the limit, the kernel’s OOM Killer forcibly terminates the JVM process. From Java’s perspective, there’s no chance to throw an OutOfMemoryError — it gets killed instantly with SIGKILL (kill -9).

Languages like Go and Python use OS memory directly, so memory usage is predictable. Java’s JVM intermediary creates a gap between the container limit and JVM settings where OOM can occur.

Practical Guidelines#

  1. Set heap to 75% or less of the container memory limit — leave headroom for non-heap memory

  2. Check OOM logs in /var/log/kern.log — dmesg is cleared on reboot

  3. Memory management matters especially with swap disabled — K8s nodes typically run with swap off

  4. Set Memory Requests appropriately — affects node selection during scheduling

  5. Monitoring is essential — regularly check memory usage trends in Grafana


Files to Update#

  • 303-goormgb-k8s-helm/charts/java-cloud-gateway/values.yaml
    • resources.limits.memory: 512Mi → 1Gi
    • resources.requests.memory: 128Mi → 512Mi