Java OOM Troubleshooting
Problem#
On an on-prem K8s cluster (mini-might worker node), a Java app ran out of memory, causing the OOM Killer to fire and bring down the node. This post covers the analysis and resolution.
Symptoms#
- All pods on the mini-might node showed
Unknownstatus in k9s - Pods recovered sequentially after a reboot
kubectl describe node mini-mightEvents showed aRebootedentry
NodeNotReady 5m ago - node went down
Rebooted 58s ago - reboot detected
NodeReady 17s ago - recovery complete
Root Cause Analysis#
1. Checking OOM logs in dmesg#
After SSHing into the mini-might node:
sudo grep -i "oom\|killed\|out of memory" /var/log/kern.log | tail -50
OOM log found:
2026-03-22T15:00:03 mini-might kernel: oom-kill:constraint=CONSTRAINT_MEMCG
task=java,pid=90057,uid=0
Memory cgroup out of memory: Killed process 90057 (java)
total-vm:2007864kB, anon-rss:518828kB
2. Analysis#
| Item | Value |
|---|---|
| Process | java (PID 90057) |
| OOM type | CONSTRAINT_MEMCG (cgroup memory limit exceeded) |
| Memory usage | ~507MB (anon-rss: 518828kB) |
| Virtual memory | ~2GB (total-vm: 2007864kB) |
3. Checking the problematic pod configuration#
kubectl describe pod java-cloud-gateway-6558b496dd-nhhfw -n dev-webs
java-service container resources:
resources:
limits:
cpu: 500m
memory: 512Mi # <- the problem!
requests:
cpu: 100m
memory: 128Mi
JVM options:
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"
4. Root cause#
| Setting | Value | Description |
|---|---|---|
| Memory Limit | 512Mi | Maximum container memory |
| MaxRAMPercentage | 75% | JVM heap ratio |
| Calculated heap | ~384Mi | 512 × 0.75 |
| Actual usage | ~507MB | At time of OOM |
The issue:
- Java uses memory beyond just the heap (384Mi): Metaspace, Thread Stack, Native Memory
- Heap + non-heap memory exceeded the 512Mi limit → OOM Killer triggered
Fix#
Option 1: Increase Memory Limit (recommended)#
resources:
limits:
cpu: 500m
memory: 1Gi # 512Mi → 1Gi
requests:
cpu: 100m
memory: 512Mi # 128Mi → 512Mi
Option 2: Lower the JVM heap ratio#
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=50.0"
Option 3: Set heap size explicitly#
env:
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -Xmx256m -Xms256m"
Node Resource Status#
mini-might node state (after recovery)#
| Item | Value | Status |
|---|---|---|
| Memory | 7.4GB / 28GB used | OK |
| Disk | 55GB / 98GB used (59%) | OK |
| CPU Limits | 118% overcommit | Warning |
| Memory Limits | 40% (11GB/28GB) | OK |
| Swap | Disabled (0B) | Warning |
Notes:
- With swap disabled, memory exhaustion immediately triggers OOM
- CPU overcommit (118%) can cause resource contention under burst conditions
Debugging Commands#
Check node status#
kubectl get nodes
kubectl describe node <node-name>
Check OOM logs (directly on node)#
# File logs persist across reboots
sudo grep -i "oom\|killed\|out of memory" /var/log/kern.log | tail -50
sudo grep -i "oom\|killed" /var/log/syslog | tail -50
# dmesg is cleared on reboot
sudo dmesg | grep -i "oom\|killed\|out of memory"
Check kubelet logs#
sudo journalctl -u kubelet --since "1 hour ago" | tail -100
Check memory/disk status#
free -h
df -h
Check pod resource configuration#
kubectl describe pod <pod-name> -n <namespace> | grep -A10 -iE "limits:|requests:|env:"
Java Container Memory Guide#
JVM Memory Layout#
Total Container Memory
├── Heap
│ └── Controlled by -Xmx or MaxRAMPercentage
├── Metaspace
│ └── Class metadata, unlimited by default
├── Thread Stacks
│ └── ~1MB per thread
├── Native Memory
│ └── JNI, native libraries
└── Direct Buffers
└── NIO buffers
Recommended Settings#
| Container Limit | MaxRAMPercentage | Calculated Heap | Non-heap headroom |
|---|---|---|---|
| 512Mi | 50% | 256Mi | 256Mi |
| 1Gi | 75% | 768Mi | 256Mi |
| 2Gi | 75% | 1.5Gi | 512Mi |
Safe JVM Options Example#
env:
- name: JAVA_OPTS
value: >-
-XX:+UseContainerSupport
-XX:MaxRAMPercentage=75.0
-XX:MaxMetaspaceSize=128m
-XX:+ExitOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError: Causes the JVM to exit immediately on OOM, allowing K8s to restart the pod.
Lessons Learned#
Why Java is particularly vulnerable to OOM Kills#
Java runs on a JVM (virtual machine), so memory is managed at two levels:
Container memory limit (e.g., 512Mi)
├── JVM Heap (-Xmx) ← managed by GC
├── JVM Non-Heap ← Metaspace, CodeCache, Thread Stack, etc.
├── Native Memory ← JNI, NIO DirectBuffer, etc.
└── OS/overhead ← JVM process overhead
- The JVM only limits the heap — everything else is unconstrained. Even with
-Xmx256m, non-heap + native memory can consume an additional 100–200MB. - GC only runs within the heap. Even if total container memory is near the limit, the JVM has no visibility into that and won’t trigger GC.
- When container memory exceeds the limit, the kernel’s OOM Killer forcibly terminates the JVM process. From Java’s perspective, there’s no chance to throw an OutOfMemoryError — it gets killed instantly with SIGKILL (kill -9).
Languages like Go and Python use OS memory directly, so memory usage is predictable. Java’s JVM intermediary creates a gap between the container limit and JVM settings where OOM can occur.
Practical Guidelines#
Set heap to 75% or less of the container memory limit — leave headroom for non-heap memory
Check OOM logs in /var/log/kern.log — dmesg is cleared on reboot
Memory management matters especially with swap disabled — K8s nodes typically run with swap off
Set Memory Requests appropriately — affects node selection during scheduling
Monitoring is essential — regularly check memory usage trends in Grafana
Files to Update#
303-goormgb-k8s-helm/charts/java-cloud-gateway/values.yaml- resources.limits.memory: 512Mi → 1Gi
- resources.requests.memory: 128Mi → 512Mi