<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Troubleshooting on blog.212clab</title><link>https://212clab.pages.dev/en/troubleshooting/</link><description>Recent content in Troubleshooting on blog.212clab</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 20 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://212clab.pages.dev/en/troubleshooting/index.xml" rel="self" type="application/rss+xml"/><item><title>Environment-Specific Monitoring Dashboard Separation</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-20-monitoring-dashboard-env-split/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-20-monitoring-dashboard-env-split/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Monitoring dashboards were broken because the infrastructure setup differs between the Dev (kubeadm) and Staging (EKS) environments.&lt;/p&gt;
&lt;h3 id="infrastructure-differences-by-environment"&gt;Infrastructure differences by environment&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Component&lt;/th&gt;
 &lt;th&gt;Dev (kubeadm)&lt;/th&gt;
 &lt;th&gt;Staging (EKS)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Local Pod (postgres:16-alpine)&lt;/td&gt;
 &lt;td&gt;AWS RDS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Redis&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Local Pod (redis:7-alpine)&lt;/td&gt;
 &lt;td&gt;AWS ElastiCache&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Metrics collection&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Prometheus exporter&lt;/td&gt;
 &lt;td&gt;CloudWatch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Data source&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Prometheus&lt;/td&gt;
 &lt;td&gt;CloudWatch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;RDS PostgreSQL Monitoring&amp;rdquo; dashboard in Dev → &lt;strong&gt;No data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;ElastiCache Redis Monitoring&amp;rdquo; dashboard in Dev → &lt;strong&gt;No data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Cause: no CloudWatch data source (not an AWS environment)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="fix"&gt;Fix&lt;/h2&gt;
&lt;h3 id="1-separate-dashboards-by-environment"&gt;1. Separate dashboards by environment&lt;/h3&gt;
&lt;p&gt;Created two versions of the same dashboard:&lt;/p&gt;</description></item><item><title>Monitoring Data S3 Object Storage Architecture</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-20-monitoring-s3-object-storage/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-20-monitoring-s3-object-storage/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;When Spot instances were reclaimed in the Staging environment and new nodes were provisioned in a different AZ, monitoring pods (Loki, Tempo, etc.) got stuck in Pending because they couldn&amp;rsquo;t mount their existing PVCs.&lt;/p&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Spot node reclaimed → new node provisioned in a different AZ (e.g., ap-northeast-2a → 2c)&lt;/li&gt;
&lt;li&gt;Monitoring pods rescheduled → existing PVC (EBS) was bound to the previous AZ, mount failed&lt;/li&gt;
&lt;li&gt;Pod: &lt;code&gt;Pending&lt;/code&gt; → &lt;code&gt;Warning: FailedAttachVolume&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Log and trace collection interrupted&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="root-cause-ebs-is-bound-to-a-single-az"&gt;Root cause: EBS is bound to a single AZ&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[Before]
 Node (ap-northeast-2a) ← Spot
 └── Loki Pod
 └── PVC → EBS (pinned to ap-northeast-2a)

[After Spot reclaim]
 Node (ap-northeast-2c) ← newly provisioned
 └── Loki Pod (Pending)
 └── PVC → EBS (ap-northeast-2a) ← cannot be mounted!
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;EBS volumes can only be used in the AZ where they were created. Because Spot instances can be provisioned in any available AZ, AZ mismatches occurred frequently.&lt;/p&gt;</description></item><item><title>OTel Metrics Dashboard Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-20-otel-dashboard-gateway/</link><pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-20-otel-dashboard-gateway/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;In the Grafana &amp;ldquo;Application Monitoring (Spring Boot)&amp;rdquo; dashboard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Only gateway appeared in the application dropdown&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Other services (auth-guard, order-core, seat, queue, etc.) were missing&lt;/li&gt;
&lt;li&gt;Querying &lt;code&gt;http_server_requests_seconds_count{namespace=&amp;quot;dev-webs&amp;quot;}&lt;/code&gt; only returned results for gateway&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="1-two-separate-metric-collection-paths"&gt;1. Two separate metric collection paths&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Metric name&lt;/th&gt;
 &lt;th&gt;Labels&lt;/th&gt;
 &lt;th&gt;Targets&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;PodMonitor → Prometheus direct scrape&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;http_server_requests_seconds_*&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;status&lt;/code&gt;, &lt;code&gt;uri&lt;/code&gt;, &lt;code&gt;namespace&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;gateway only&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OTel Agent → OTel Collector → Remote Write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;http_server_request_duration_seconds_*&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;http_response_status_code&lt;/code&gt;, &lt;code&gt;http_route&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;all services&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="2-why-only-gateway-had-micrometer-metrics"&gt;2. Why only gateway had Micrometer metrics&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;gateway: Spring Cloud Gateway with &lt;code&gt;micrometer-registry-prometheus&lt;/code&gt; dependency included&lt;/li&gt;
&lt;li&gt;Other services: OTel Java Agent only, no Micrometer dependency&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="3-missing-namespace-label-in-otel-metrics"&gt;3. Missing namespace label in OTel metrics&lt;/h3&gt;
&lt;p&gt;The OTel Collector&amp;rsquo;s transform processor was configured to set the &lt;code&gt;namespace&lt;/code&gt; label:&lt;/p&gt;</description></item><item><title>CloudTrail + S3 Bucket Policy Circular Dependency</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-13-cloudtrail-s3-circular-dependency/</link><pubDate>Mon, 13 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-13-cloudtrail-s3-circular-dependency/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Running &lt;code&gt;terraform apply&lt;/code&gt; on &lt;code&gt;stacks/audit-security/&lt;/code&gt; to enable CloudTrail.&lt;/p&gt;
&lt;h3 id="error"&gt;Error&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Error: creating CloudTrail Trail (playball-audit-trail): InsufficientS3BucketPolicyException: 
Incorrect S3 bucket policy is detected for bucket: playball-audit-logs
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Circular Dependency:&lt;/strong&gt;&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Creating CloudTrail → requires S3 bucket policy with write permission for CloudTrail
 ↕
S3 bucket policy → needs to reference the CloudTrail ARN (module.cloudtrail.source_arn)
 ↕
CloudTrail ARN → only exists after CloudTrail is created
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The code used a &lt;code&gt;dynamic &amp;quot;statement&amp;quot;&lt;/code&gt; to add the CloudTrail permission only when &lt;code&gt;module.cloudtrail.source_arn != null&lt;/code&gt;. But since CloudTrail hasn&amp;rsquo;t been created yet, &lt;code&gt;source_arn = null&lt;/code&gt; → dynamic block is skipped → bucket policy has no CloudTrail permission → CloudTrail creation fails.&lt;/p&gt;</description></item><item><title>Route53 Record Deletion + Secrets Manager Reset</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-10-external-dns-sync-secrets/</link><pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-10-external-dns-sync-secrets/</guid><description>&lt;blockquote&gt;
&lt;p&gt;2026-04-10 | Route53 deletion and Secret reset triggered by a team member&amp;rsquo;s terraform apply&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Route53 records deleted&lt;/strong&gt;: DNS records disappeared after EKS restart/scale-down&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secrets Manager reset&lt;/strong&gt;: DB/Redis passwords and other secrets were overwritten on &lt;code&gt;terraform apply&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="root-cause-1-external-dns-policy-sync"&gt;Root Cause 1: external-dns &lt;code&gt;policy: sync&lt;/code&gt;&lt;/h2&gt;
&lt;h3 id="the-issue"&gt;The issue&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;policy&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;sync &lt;/span&gt; &lt;span style="color:#75715e"&gt;# allows record deletion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;policy: sync&lt;/code&gt; &lt;strong&gt;deletes Route53 records&lt;/strong&gt; when the corresponding Kubernetes Ingress/Service is removed.
Bringing down or scaling in EKS → Ingress deleted → external-dns deletes Route53 records.&lt;/p&gt;</description></item><item><title>Istio Sidecar Startup Timing (holdApplicationUntilProxyStarts)</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-08-istio-sidecar-startup-timing/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-08-istio-sidecar-startup-timing/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Pods in dev-webs were stuck at &lt;code&gt;1/2 Running&lt;/code&gt;. Readiness probes were failing with &lt;code&gt;context deadline exceeded&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl get pods -n dev-webs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# NAME READY STATUS RESTARTS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# dev-api-gateway-xxx 1/2 Running 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl describe pod dev-api-gateway-xxx -n dev-webs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Readiness probe failed: Get &amp;#34;http://10.x.x.x:8085/actuator/health/readiness&amp;#34;: context deadline exceeded&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="root-cause"&gt;Root Cause&lt;/h2&gt;
&lt;p&gt;The app container started before the Istio sidecar (envoy-proxy) was fully initialized.
When the app tried to connect to external services (DB, Redis, etc.), the sidecar wasn&amp;rsquo;t ready to handle traffic yet, causing the connections to fail.&lt;/p&gt;</description></item><item><title>k6-operator MaxVUs Parallelism Error</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-05-k6-operator-maxvus/</link><pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-05-k6-operator-maxvus/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;A 4000 VU load test failed immediately upon execution:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Status: error (Pod: Succeeded)
Summary not available (test may have ended too quickly)
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;k6-operator logs:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;k6 inspect: {MaxVUs:1 ...}
ERROR: Parallelism argument cannot be larger than maximum VUs in the script
{&amp;#34;maxVUs&amp;#34;: 1, &amp;#34;parallelism&amp;#34;: 2, &amp;#34;error&amp;#34;: &amp;#34;number of instances &amp;gt; number of VUs&amp;#34;}
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="k6-operator-execution-flow"&gt;k6-operator execution flow&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;1. TestRun CR created
 ↓
2. Initializer Pod runs
 - k6 inspect: analyzes script (determines maxVUs) ← no env vars here!
 - k6 archive: compresses script
 ↓
3. Runner Pod created (VUS, DURATION env vars injected)
 ↓
4. Test runs
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="problematic-code"&gt;Problematic code&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-javascript" data-lang="javascript"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// k6 script generated by scripts.go
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;_vus&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;__ENV&lt;/span&gt;.&lt;span style="color:#a6e22e"&gt;K6_VUS&lt;/span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt; parseInt(&lt;span style="color:#a6e22e"&gt;__ENV&lt;/span&gt;.&lt;span style="color:#a6e22e"&gt;K6_VUS&lt;/span&gt;) &lt;span style="color:#f92672"&gt;:&lt;/span&gt; (parseInt(&lt;span style="color:#a6e22e"&gt;__ENV&lt;/span&gt;.&lt;span style="color:#a6e22e"&gt;VUS&lt;/span&gt;) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// ↑
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// default value 1!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Stage&lt;/th&gt;
 &lt;th&gt;Env vars&lt;/th&gt;
 &lt;th&gt;_vus value&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;k6 inspect (Initializer)&lt;/td&gt;
 &lt;td&gt;none&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;1&lt;/strong&gt; (default)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;k6 run (Runner)&lt;/td&gt;
 &lt;td&gt;VUS=4000&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;4000&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;k6 inspect&lt;/code&gt; analyzes &lt;code&gt;export const options&lt;/code&gt; in the script to determine maxVUs.
Without env vars, the default value of 1 is used, so maxVUs is reported as 1.&lt;/p&gt;</description></item><item><title>DiskPressure &amp; CrashLoopBackOff Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-04-02-disk-pressure-crashloop/</link><pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-04-02-disk-pressure-crashloop/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;1,253 argocd-repo-server pods created on the mini-gmk node (all in Evicted state)&lt;/li&gt;
&lt;li&gt;DiskPressure: True&lt;/li&gt;
&lt;li&gt;Multiple services in CrashLoopBackOff (ai-defense, seat, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="1-diskpressure-mini-gmk"&gt;1. DiskPressure (mini-gmk)&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;th&gt;Usage&lt;/th&gt;
 &lt;th&gt;Cause&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;/opt/local-path-provisioner&lt;/td&gt;
 &lt;td&gt;53G&lt;/td&gt;
 &lt;td&gt;Loki chunks 40G + orphan PVs&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Loki retention&lt;/td&gt;
 &lt;td&gt;not configured&lt;/td&gt;
 &lt;td&gt;Logs accumulated indefinitely&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="2-crashloopbackoff"&gt;2. CrashLoopBackOff&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Service&lt;/th&gt;
 &lt;th&gt;Cause&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;staging ai-defense&lt;/td&gt;
 &lt;td&gt;Missing &lt;code&gt;imagePullSecrets: []&lt;/code&gt; (default ecr-pull-secret being used)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dev ai-defense&lt;/td&gt;
 &lt;td&gt;Missing &lt;code&gt;TM_OFFLINE_LLM_AUDIT_PATH&lt;/code&gt; env var (PermissionError: logs/)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dev seat&lt;/td&gt;
 &lt;td&gt;Missing &lt;code&gt;SPRING_KAFKA_BOOTSTRAP_SERVERS&lt;/code&gt; env var&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="fix"&gt;Fix&lt;/h2&gt;
&lt;h3 id="resolving-diskpressure"&gt;Resolving DiskPressure&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 1. Delete orphan PVs (~8G freed)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo rm -rf /opt/local-path-provisioner/pvc-*_data_cloudbeaver-data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo rm -rf /opt/local-path-provisioner/pvc-*_monitoring_data-prometheus-*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# (PVs not currently in use)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 2. Delete Loki chunks (~40G freed)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo find /opt/local-path-provisioner/pvc-*_monitoring_storage-loki-0/chunks -type f -mtime +1 -delete
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo rm -rf /opt/local-path-provisioner/pvc-*_monitoring_storage-loki-0/chunks/*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 3. Clean up system logs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo journalctl --vacuum-size&lt;span style="color:#f92672"&gt;=&lt;/span&gt;200M
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 4. Delete Evicted pod records&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl delete pods -A --field-selector&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status.phase&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Failed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl delete pods -A --field-selector&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status.phase&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Succeeded
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="helm-values-updates"&gt;Helm Values Updates&lt;/h3&gt;
&lt;h4 id="1-add-loki-retention-devstaging"&gt;1. Add Loki retention (dev/staging)&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# dev/values/monitoring/values-loki.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;loki&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;compactor&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;retention_enabled&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;delete_request_store&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;filesystem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;limits_config&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;retention_period&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;72h &lt;/span&gt; &lt;span style="color:#75715e"&gt;# 3 days&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="2-add-revisionhistorylimit-common-charts"&gt;2. Add revisionHistoryLimit (common charts)&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# common-charts/apps/java-service/templates/deployment.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# common-charts/apps/ai-service/templates/deployment.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;revisionHistoryLimit&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="3-staging-ai-defense-imagepullsecrets"&gt;3. staging ai-defense imagePullSecrets&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# staging/values/apps/values-ai-defense.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;imagePullSecrets&lt;/span&gt;: [] &lt;span style="color:#75715e"&gt;# Disabled since IRSA is used&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="4-dev-ai-defense-environment-variable"&gt;4. dev ai-defense environment variable&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# dev/values/apps/values-ai-defense.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;env&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;TM_OFFLINE_LLM_AUDIT_PATH&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;value&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;/tmp/logs/offline_llm_audit.jsonl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="5-dev-seat-kafka-config"&gt;5. dev seat Kafka config&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# dev/values/apps/values-seat.yaml&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;env&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;SPRING_KAFKA_BOOTSTRAP_SERVERS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;value&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;kafka.messaging.svc.cluster.local:9092&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="commit"&gt;Commit&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;feat(sprint5/troubleshooting): resolve disk pressure and CrashLoopBackOff issues

- common-charts: add revisionHistoryLimit: 3
- dev/staging loki: add compactor for retention
- staging ai-defense: add imagePullSecrets: []
- dev ai-defense: add TM_OFFLINE_LLM_AUDIT_PATH
- dev seat: add Kafka bootstrap servers
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="lessons-learned"&gt;Lessons Learned&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Loki retention is mandatory&lt;/strong&gt; — retention_period does nothing without the compactor&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;local-path-provisioner orphan PVs&lt;/strong&gt; — deleting a PVC does not automatically remove the directory on disk&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Staging uses IRSA&lt;/strong&gt; — &lt;code&gt;imagePullSecrets: []&lt;/code&gt; must be explicitly set to override the default&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;revisionHistoryLimit&lt;/strong&gt; — the default of 10 is too high; cap it at 3&lt;/li&gt;
&lt;/ol&gt;</description></item><item><title>WireGuard + Cilium IP Range Conflict Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-29-wireguard-cilium-ip-conflict/</link><pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-29-wireguard-cilium-ip-conflict/</guid><description>&lt;h3 id="date"&gt;Date&lt;/h3&gt;
&lt;p&gt;2026-03-29&lt;/p&gt;
&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;External WireGuard VPN connections stopped working after migrating to Cilium (eBPF).&lt;/p&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ping 10.0.0.1&lt;/code&gt; failing from an external network (hotspot) — 100% packet loss&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ssh grgb-vpn&lt;/code&gt; unreachable&lt;/li&gt;
&lt;li&gt;Everything worked fine from the home network&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="diagnosis-step-1-tcpdump"&gt;Diagnosis step 1: tcpdump&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo tcpdump -i enp3s0 udp port &lt;span style="color:#ae81ff"&gt;51820&lt;/span&gt; -n
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Result: 0 packets — packets never reached the server&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Initially suspected a router/ISP issue, but&amp;hellip;&lt;/p&gt;
&lt;h3 id="diagnosis-step-2-check-cilium-configuration"&gt;Diagnosis step 2: Check Cilium configuration&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl -n kube-system get cm cilium-config -o yaml | grep cluster-pool
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# cluster-pool-ipv4-cidr: 10.0.0.0/8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="diagnosis-step-3-check-routing-table"&gt;Diagnosis step 3: Check routing table&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ip route | grep 10.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# 10.0.0.0/24 via 10.0.0.224 dev cilium_host&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="root-cause"&gt;Root cause&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Cilium claimed the entire &lt;code&gt;10.0.0.0/8&lt;/code&gt; range as its cluster network&lt;/li&gt;
&lt;li&gt;WireGuard was also using &lt;code&gt;10.0.0.0/24&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IP range conflict&lt;/strong&gt;: Cilium was routing WireGuard traffic to &lt;code&gt;cilium_host&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="fix"&gt;Fix&lt;/h2&gt;
&lt;h3 id="change-wireguard-network-range"&gt;Change WireGuard network range&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;10.0.0.0/24&lt;/code&gt; → &lt;code&gt;172.30.0.0/24&lt;/code&gt; (a private IP range that doesn&amp;rsquo;t overlap with Cilium)&lt;/p&gt;</description></item><item><title>Java OOM Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-23-java-oom/</link><pubDate>Mon, 23 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-23-java-oom/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;On an on-prem K8s cluster (mini-might worker node), a Java app ran out of memory, causing the OOM Killer to fire and bring down the node. This post covers the analysis and resolution.&lt;/p&gt;
&lt;hr&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;All pods on the mini-might node showed &lt;code&gt;Unknown&lt;/code&gt; status in k9s&lt;/li&gt;
&lt;li&gt;Pods recovered sequentially after a reboot&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubectl describe node mini-might&lt;/code&gt; Events showed a &lt;code&gt;Rebooted&lt;/code&gt; entry&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;NodeNotReady 5m ago - node went down
Rebooted 58s ago - reboot detected
NodeReady 17s ago - recovery complete
&lt;/code&gt;&lt;/pre&gt;&lt;hr&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="1-checking-oom-logs-in-dmesg"&gt;1. Checking OOM logs in dmesg&lt;/h3&gt;
&lt;p&gt;After SSHing into the mini-might node:&lt;/p&gt;</description></item><item><title>ALB Elastic IP Auto-Assignment Issue</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-22-alb-eip-auto-assign/</link><pubDate>Sun, 22 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-22-alb-eip-auto-assign/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Date&lt;/strong&gt;: 2026-03-22&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Environment&lt;/strong&gt;: EKS Staging (goormgb-staging-eks)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: Elastic IPs automatically assigned to an internet-facing ALB, generating unexpected costs&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;h3 id="how-it-was-discovered"&gt;How It Was Discovered&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Changed ArgoCD from &lt;code&gt;type: LoadBalancer&lt;/code&gt; (NLB) to &lt;code&gt;type: ClusterIP&lt;/code&gt; (ALB Ingress)&lt;/li&gt;
&lt;li&gt;After deleting the NLB, found 2 EIPs remaining&lt;/li&gt;
&lt;li&gt;Attempting to disassociate/release the EIPs resulted in a permission error&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="error-message"&gt;Error Message&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;An error occurred (AuthFailure) when calling the DisassociateAddress operation:
You do not have permission to access the specified resource.
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="confirmed-state"&gt;Confirmed State&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;aws ec2 describe-network-interfaces &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --filters &lt;span style="color:#e6db74"&gt;&amp;#34;Name=addresses.private-ip-address,Values=10.0.18.47&amp;#34;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; --query &lt;span style="color:#e6db74"&gt;&amp;#39;NetworkInterfaces[*].[Description,NetworkInterfaceId]&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Result: ELB app/k8s-stagingalb-4f414fcf8f/...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="root-cause"&gt;Root Cause&lt;/h2&gt;
&lt;h3 id="aws-official-answer"&gt;AWS Official Answer&lt;/h3&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Automatic EIP assignment to internet-facing ALBs is expected behavior.&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>EKS Node Cluster Join Failure Checklist</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-21-eks-node-join-failure/</link><pubDate>Sat, 21 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-21-eks-node-join-failure/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;EKS node group created but nodes are not joining the cluster&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kubectl get nodes&lt;/code&gt; shows no nodes&lt;/li&gt;
&lt;li&gt;EC2 instances are in Running state&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="checklist"&gt;Checklist&lt;/h2&gt;
&lt;h3 id="1-vpc-dns-settings"&gt;1. VPC DNS Settings&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Setting&lt;/th&gt;
 &lt;th&gt;Required Value&lt;/th&gt;
 &lt;th&gt;Check&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;enable_dns_hostnames&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;enable_dns_support&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Verification:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-hcl" data-lang="hcl"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;resource&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;aws_vpc&amp;#34; &amp;#34;main&amp;#34;&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_dns_hostnames &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_dns_support &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h3 id="2-subnet-tags"&gt;2. Subnet Tags&lt;/h3&gt;
&lt;p&gt;The EKS controller requires specific tags to recognize subnets&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Subnet&lt;/th&gt;
 &lt;th&gt;Tag&lt;/th&gt;
 &lt;th&gt;Check&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Public&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;kubernetes.io/role/elb = 1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Private&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;kubernetes.io/role/internal-elb = 1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Both&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;kubernetes.io/cluster/&amp;lt;cluster-name&amp;gt; = shared&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Verification:&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>EKS Troubleshooting - Node Group Creation Failure and Missing vpc-cni Addon</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-21-eks-nodegroup-creation-failure/</link><pubDate>Sat, 21 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-21-eks-nodegroup-creation-failure/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Two major issues encountered while building an EKS 1.34 cluster, along with their resolutions.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="1-node-group-creation-failure---kubelet-label-issue"&gt;1. Node Group Creation Failure - kubelet Label Issue&lt;/h2&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Node Group status: &lt;code&gt;CREATE_FAILED&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Error message: &lt;code&gt;NodeCreationFailure - Unhealthy nodes in the kubernetes cluster&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;EC2 instances are running but not joining the cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h3&gt;
&lt;p&gt;Connected to the node via SSM and checked kubelet logs:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sudo journalctl -u kubelet -n &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;kubelet error:&lt;/strong&gt;&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;unknown &amp;#39;kubernetes.io&amp;#39; or &amp;#39;k8s.io&amp;#39; labels specified with --node-labels
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="root-cause"&gt;Root Cause&lt;/h3&gt;
&lt;p&gt;The Terraform EKS module was using &lt;code&gt;node-role.kubernetes.io/infra&lt;/code&gt; as a node group label:&lt;/p&gt;</description></item><item><title>K8s Taint vs NodeSelector Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-20-taint-nodeselector/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-20-taint-nodeselector/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;ArgoCD alerts triggered for Pods in dev-webs showing &lt;code&gt;Degraded&lt;/code&gt; status:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;dev-order-core&lt;/li&gt;
&lt;li&gt;dev-seat&lt;/li&gt;
&lt;li&gt;dev-queue&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause"&gt;Root Cause&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The app-dedicated node (mini-might) ran out of resources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Cause: The control-plane node (mini-gmk) has a &lt;code&gt;node-role.kubernetes.io/control-plane:NoSchedule&lt;/code&gt; taint, which prevented infra Pods (istio, loki, calico, coredns, metrics-server, etc.) from being scheduled there. They all got pushed onto the &lt;strong&gt;app-only worker node&lt;/strong&gt; instead.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Infra Pods scheduled on mini-might (worker):
- istio-ingressgateway (x2)
- istiod
- kiali
- loki-0, loki-chunks-cache-0, loki-results-cache-0
- calico-apiserver (x2), calico-kube-controllers, calico-typha
- coredns (x2)
- metrics-server
- alloy
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;→ Resource contention when deploying app Pods, new Pods go Pending&lt;/p&gt;</description></item><item><title>Kubernetes Service Selector &amp; Endpoints Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-19-service-selector-endpoints/</link><pubDate>Thu, 19 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-19-service-selector-endpoints/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;API requests returning &lt;strong&gt;503 Service Unavailable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Istio logs showing: &lt;code&gt;no_healthy_upstream&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Pods are in Running state but service routing is broken&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause"&gt;Root Cause&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Mismatch between Service selector labels and Pod labels&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Check endpoints - &amp;lt;none&amp;gt; means something is wrong!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kubectl -n staging-webs get endpoints api-gateway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NAME ENDPOINTS AGE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;api-gateway &amp;lt;none&amp;gt; 34h &lt;span style="color:#75715e"&gt;# ← no Pod IPs!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="how-it-works"&gt;How It Works&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│ Service (api-gateway) │
│ selector: │
│ app: staging-api-gateway │
│ app.kubernetes.io/part-of: staging-webs ← required! │
└─────────────────────────────────────────────────────────┘
 │
 ▼ find Pods with matching selector labels
┌─────────────────────────────────────────────────────────┐
│ Endpoints (auto-generated) │
│ addresses: │
│ - 10.0.20.100 (Pod1 IP) │
│ - 10.0.20.101 (Pod2 IP) │
└─────────────────────────────────────────────────────────┘
 │
 ▼ load balance traffic
┌─────────────────────────────────────────────────────────┐
│ Pod (must have labels that match the selector) │
│ labels: │
│ app: staging-api-gateway ✓ │
│ app.kubernetes.io/part-of: staging-webs ✓ │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;Key point: All labels in the Service selector must be present on the Pod for it to be registered in Endpoints&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Java Spring Boot Probe Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-18-java-probe-crashloop/</link><pubDate>Wed, 18 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-18-java-probe-crashloop/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Java Spring Boot apps in the Staging EKS environment kept restarting in CrashLoopBackOff.&lt;/p&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pods receive SIGTERM (exit code 143) within 60 seconds of starting&lt;/li&gt;
&lt;li&gt;kubelet forcefully kills the container due to liveness probe failures&lt;/li&gt;
&lt;li&gt;404 errors (initially) → 503/500 errors (after partial fix)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="1-missing-context-path-primary-cause"&gt;1. Missing Context Path (Primary Cause)&lt;/h3&gt;
&lt;p&gt;Spring Boot apps use a context path:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auth-guard: &lt;code&gt;/auth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;queue: &lt;code&gt;/queue&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;seat: &lt;code&gt;/seat&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;order-core: &lt;code&gt;/order&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the probe configuration was:&lt;/p&gt;</description></item><item><title>OpenTelemetry + Istio mTLS Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-18-otel-istio-mtls/</link><pubDate>Wed, 18 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-18-otel-istio-mtls/</guid><description>&lt;h2 id="problem"&gt;Problem&lt;/h2&gt;
&lt;p&gt;Java services in the EKS Staging environment failing to start in CrashLoopBackOff. Log analysis revealed that the OTEL (OpenTelemetry) agent was failing to connect to the OTEL Collector.&lt;/p&gt;
&lt;h3 id="environment"&gt;Environment&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;: EKS 1.34&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Istio&lt;/strong&gt;: mTLS STRICT mode enabled&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OTEL Agent&lt;/strong&gt;: opentelemetry-javaagent v2.11.0&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OTEL Collector&lt;/strong&gt;: opentelemetry-collector (Deployment)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Namespace&lt;/strong&gt;: staging-webs (apps), monitoring (OTEL Collector)&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h3 id="symptoms"&gt;Symptoms&lt;/h3&gt;
&lt;h3 id="pod-status"&gt;Pod Status&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;staging-webs auth-guard-xxx 0/1 CrashLoopBackOff 17
staging-webs order-core-xxx 0/1 CrashLoopBackOff 21
staging-webs queue-xxx 0/1 CrashLoopBackOff 22
staging-webs seat-xxx 0/1 CrashLoopBackOff 21
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="error-logs"&gt;Error Logs&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[otel.javaagent] ERROR io.opentelemetry.exporter.internal.grpc.GrpcExporter -
Failed to export metrics. Server is UNAVAILABLE.
Make sure your collector is running and reachable from this network.
Full error message: upstream connect error or disconnect/reset before headers.
retried and the latest reset reason: remote connection failure,
transport failure reason: TLS_error:|268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER:TLS_error_end
&lt;/code&gt;&lt;/pre&gt;&lt;hr&gt;
&lt;h2 id="root-cause-analysis"&gt;Root Cause Analysis&lt;/h2&gt;
&lt;h3 id="1-normal-istio-mtls-traffic-flow"&gt;1. Normal Istio mTLS Traffic Flow&lt;/h3&gt;
&lt;p&gt;For typical service-to-service communication:&lt;/p&gt;</description></item><item><title>Chrome QUIC/HTTP3 Intermittent 404 Troubleshooting</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-17-chrome-quic-http3-detail/</link><pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-17-chrome-quic-http3-detail/</guid><description>&lt;blockquote&gt;
&lt;p&gt;Resolving intermittent 404 errors in Chrome on a home server (kubeadm) + SK Broadband environment&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="home-server-infrastructure"&gt;Home Server Infrastructure&lt;/h2&gt;
&lt;h3 id="11-overall-network-architecture"&gt;1.1 Overall Network Architecture&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;flowchart TB
 subgraph Internet[&amp;#34;Internet&amp;#34;]
 CF[&amp;#34;Cloudflare Edge&amp;lt;br/&amp;gt;(Seoul PoP)&amp;#34;]
 ISP[&amp;#34;SK Broadband&amp;lt;br/&amp;gt;Public IP: Dynamic&amp;lt;br/&amp;gt;(39.119.192.15)&amp;#34;]
 end

 subgraph HomeNetwork[&amp;#34;Home Network (192.168.45.0/24)&amp;#34;]
 subgraph Router[&amp;#34;SK Broadband Router&amp;#34;]
 NAT[&amp;#34;NAT/DHCP&amp;#34;]
 FW_R[&amp;#34;Firewall&amp;#34;]
 PF[&amp;#34;Port Forwarding&amp;lt;br/&amp;gt;:80 → .154:80&amp;lt;br/&amp;gt;:443 → .154:443&amp;#34;]
 DHCP_RES[&amp;#34;DHCP Reservation&amp;lt;br/&amp;gt;mini-gmk: .123&amp;lt;br/&amp;gt;mini-might: .154&amp;#34;]
 end

 subgraph ControlPlane[&amp;#34;mini-gmk (Control Plane)&amp;lt;br/&amp;gt;192.168.45.123&amp;#34;]
 UFW_C[&amp;#34;ufw firewall&amp;lt;br/&amp;gt;22, 6443, 10250-10252&amp;lt;br/&amp;gt;2379-2380, 179&amp;#34;]
 K8S_CP[&amp;#34;Kubernetes Control Plane&amp;lt;br/&amp;gt;kube-apiserver&amp;lt;br/&amp;gt;etcd, scheduler&amp;lt;br/&amp;gt;controller-manager&amp;#34;]
 ISTIOD[&amp;#34;istiod (pilot)&amp;#34;]
 CERTM[&amp;#34;cert-manager&amp;#34;]
 DDNS[&amp;#34;ddns-cloudflare&amp;lt;br/&amp;gt;CronJob&amp;#34;]
 ESO[&amp;#34;external-secrets-operator&amp;#34;]
 CALICO_CP[&amp;#34;calico-typha&amp;lt;br/&amp;gt;calico-kube-controllers&amp;#34;]
 end

 subgraph Worker[&amp;#34;mini-might (Worker)&amp;lt;br/&amp;gt;192.168.45.154&amp;#34;]
 UFW_W[&amp;#34;ufw firewall&amp;lt;br/&amp;gt;22, 80, 443&amp;lt;br/&amp;gt;10250, 30000-32767&amp;#34;]
 INGRESS[&amp;#34;istio-ingressgateway&amp;lt;br/&amp;gt;externalIPs: .154&amp;#34;]
 GW[&amp;#34;java-cloud-gateway&amp;lt;br/&amp;gt;:8085&amp;#34;]
 AUTH[&amp;#34;auth-guard (x2)&amp;lt;br/&amp;gt;:8080&amp;#34;]
 APPS[&amp;#34;order-core :8083&amp;lt;br/&amp;gt;seat :8082&amp;lt;br/&amp;gt;queue :8081&amp;#34;]
 CALICO_W[&amp;#34;calico-node&amp;#34;]
 end

 subgraph VPN_Access[&amp;#34;Team Member Access&amp;#34;]
 VPN[&amp;#34;Tailscale/WireGuard&amp;lt;br/&amp;gt;VPN&amp;#34;]
 SSH[&amp;#34;SSH Tunnel&amp;lt;br/&amp;gt;:22&amp;#34;]
 end
 end

 subgraph Clients[&amp;#34;Clients&amp;#34;]
 Chrome[&amp;#34;Chrome&amp;lt;br/&amp;gt;(QUIC/HTTP3)&amp;#34;]
 Safari[&amp;#34;Safari&amp;lt;br/&amp;gt;(HTTP/2)&amp;#34;]
 Firefox[&amp;#34;Firefox&amp;lt;br/&amp;gt;(HTTP/2)&amp;#34;]
 Mobile[&amp;#34;Mobile LTE&amp;#34;]
 TeamMember[&amp;#34;Team Member (VPN)&amp;#34;]
 end

 Chrome --&amp;gt;|&amp;#34;QUIC/UDP:443&amp;lt;br/&amp;gt;❌ intermittent failure&amp;#34;| ISP
 Safari --&amp;gt;|&amp;#34;HTTP/2/TCP:443&amp;#34;| ISP
 Firefox --&amp;gt;|&amp;#34;HTTP/2/TCP:443&amp;#34;| ISP
 Mobile --&amp;gt;|&amp;#34;HTTP/2&amp;#34;| ISP

 Chrome --&amp;gt;|&amp;#34;QUIC/UDP:443&amp;lt;br/&amp;gt;✅ stable&amp;#34;| CF
 CF --&amp;gt;|&amp;#34;HTTP/2/TCP:443&amp;#34;| ISP

 ISP --&amp;gt; NAT
 NAT --&amp;gt; FW_R
 FW_R --&amp;gt; PF
 PF --&amp;gt;|&amp;#34;:443&amp;#34;| UFW_W
 UFW_W --&amp;gt; INGRESS

 INGRESS --&amp;gt; GW
 GW --&amp;gt; AUTH
 AUTH --&amp;gt; APPS

 TeamMember --&amp;gt;|&amp;#34;VPN&amp;#34;| VPN
 VPN --&amp;gt; SSH
 SSH --&amp;gt; UFW_C
 SSH --&amp;gt; UFW_W

 DDNS --&amp;gt;|&amp;#34;API call&amp;#34;| CF

 ISTIOD -.-&amp;gt;|&amp;#34;xDS&amp;#34;| INGRESS
 CERTM -.-&amp;gt;|&amp;#34;TLS cert&amp;#34;| INGRESS
 ESO -.-&amp;gt;|&amp;#34;Secret sync&amp;#34;| DDNS

 classDef problem fill:#ff6b6b,stroke:#c0392b,color:#fff
 classDef solution fill:#2ecc71,stroke:#27ae60,color:#fff
 classDef infrastructure fill:#3498db,stroke:#2980b9,color:#fff

 class Chrome problem
 class CF solution
 class Router,ControlPlane,Worker infrastructure
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="12-sk-broadband-router-settings"&gt;1.2 SK Broadband Router Settings&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;flowchart LR
 subgraph SKRouter[&amp;#34;SK Broadband Router Settings&amp;#34;]
 direction TB

 subgraph DHCP[&amp;#34;DHCP Settings&amp;#34;]
 DHCP_RANGE[&amp;#34;DHCP range: 192.168.45.100 ~ .200&amp;#34;]
 DHCP_RES1[&amp;#34;Reservation 1: mini-gmk&amp;lt;br/&amp;gt;MAC: XX:XX:XX:XX:XX:XX&amp;lt;br/&amp;gt;IP: 192.168.45.123&amp;#34;]
 DHCP_RES2[&amp;#34;Reservation 2: mini-might&amp;lt;br/&amp;gt;MAC: YY:YY:YY:YY:YY:YY&amp;lt;br/&amp;gt;IP: 192.168.45.154&amp;#34;]
 end

 subgraph PortForward[&amp;#34;Port Forwarding&amp;#34;]
 PF_HTTP[&amp;#34;External :80 → 192.168.45.154:80&amp;#34;]
 PF_HTTPS[&amp;#34;External :443 → 192.168.45.154:443&amp;#34;]
 PF_VPN[&amp;#34;External :51820 → 192.168.45.123:51820&amp;lt;br/&amp;gt;(WireGuard, optional)&amp;#34;]
 end

 subgraph Firewall[&amp;#34;Firewall Settings&amp;#34;]
 FW_IN[&amp;#34;Inbound: allow 80, 443&amp;#34;]
 FW_OUT[&amp;#34;Outbound: allow all&amp;#34;]
 FW_ICMP[&amp;#34;ICMP: allow (ping)&amp;#34;]
 end

 subgraph NAT_Config[&amp;#34;NAT Settings&amp;#34;]
 NAT_TYPE[&amp;#34;NAT type: Symmetric&amp;#34;]
 NAT_UDP[&amp;#34;UDP timeout: 30s (root cause)&amp;#34;]
 NAT_TCP[&amp;#34;TCP timeout: 3600s&amp;#34;]
 end
 end

 style NAT_UDP fill:#ff6b6b,stroke:#c0392b,color:#fff
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;Router settings detail:&lt;/strong&gt;&lt;/p&gt;</description></item><item><title>Resolving Swagger 403, ArgoCD Dashboard, and CORS Issues</title><link>https://212clab.pages.dev/en/troubleshooting/2026-03-11-swagger-argocd-cors/</link><pubDate>Wed, 11 Mar 2026 00:00:00 +0000</pubDate><guid>https://212clab.pages.dev/en/troubleshooting/2026-03-11-swagger-argocd-cors/</guid><description>&lt;blockquote&gt;
&lt;p&gt;2026-03-11 | Resolving Swagger 403, ArgoCD Dashboard, and CORS Issues&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="1-swagger-403--oauth-login-redirect"&gt;1. Swagger 403 → OAuth Login Redirect&lt;/h2&gt;
&lt;h3 id="problem"&gt;Problem&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Accessing &lt;code&gt;swagger.dev.goormgb.space&lt;/code&gt; shows a &lt;strong&gt;403 Forbidden&lt;/strong&gt; page&lt;/li&gt;
&lt;li&gt;No redirect to the login page&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="root-cause"&gt;Root Cause&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Browser has a &lt;strong&gt;stale cookie&lt;/strong&gt; (expired &lt;code&gt;_oauth2_proxy&lt;/code&gt; cookie)&lt;/li&gt;
&lt;li&gt;OAuth2 Proxy validates the cookie, finds it invalid → returns &lt;strong&gt;403&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Normal flow: no cookie → 302 redirect → login&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[Normal flow - no cookie]
User → OAuth2 Proxy → no cookie → 302 → Google login

[Problem - expired cookie]
User → OAuth2 Proxy → cookie present (expired) → 403 Forbidden
 ↑ stops here!
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="fix-envoyfilter-to-convert-403302-redirect"&gt;Fix: EnvoyFilter to convert 403→302 redirect&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;apiVersion&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;networking.istio.io/v1alpha3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;kind&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;EnvoyFilter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;metadata&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;swagger-403-redirect&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;namespace&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;istio-system&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;spec&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;workloadSelector&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;labels&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;istio&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;ingressgateway&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;configPatches&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; - &lt;span style="color:#f92672"&gt;applyTo&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;HTTP_FILTER&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;match&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;context&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;GATEWAY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;listener&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;filterChain&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;filter&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;envoy.filters.network.http_connection_manager&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;subFilter&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;envoy.filters.http.router&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;patch&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;operation&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;INSERT_BEFORE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;value&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;name&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;envoy.filters.http.lua&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;typed_config&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;#34;@type&amp;#34;: &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;inlineCode&lt;/span&gt;: |&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; function envoy_on_response(response_handle)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; local host = response_handle:headers():get(&amp;#34;:authority&amp;#34;) or &amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; local status = response_handle:headers():get(&amp;#34;:status&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -- only handle 403 responses on the swagger domain
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; if string.find(host, &amp;#34;swagger&amp;#34;) and status == &amp;#34;403&amp;#34; then
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; local path = response_handle:headers():get(&amp;#34;x-original-uri&amp;#34;) or &amp;#34;/&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; local redirect_url = &amp;#34;/oauth2/start?rd=&amp;#34; .. path
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; -- convert to 302 redirect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; response_handle:headers():replace(&amp;#34;:status&amp;#34;, &amp;#34;302&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; response_handle:headers():add(&amp;#34;location&amp;#34;, redirect_url)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; response_handle:headers():add(&amp;#34;cache-control&amp;#34;, &amp;#34;no-cache, no-store&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; end
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="key-points"&gt;Key Points&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;envoy_on_response&lt;/strong&gt;: intercepts at the response phase&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;swagger domain only&lt;/strong&gt;: no impact on other services&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;403 → 302&lt;/strong&gt; conversion + &lt;code&gt;/oauth2/start&lt;/code&gt; redirect&lt;/li&gt;
&lt;li&gt;OAuth2 Proxy restarts the login process&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="2-cors-errors-causing-frontend-blank-page"&gt;2. CORS Errors Causing Frontend Blank Page&lt;/h2&gt;
&lt;h3 id="problem-1"&gt;Problem&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;CORS errors when the frontend makes API calls&lt;/li&gt;
&lt;li&gt;Only errors visible in the browser console; the page goes blank&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="root-cause-1"&gt;Root Cause&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;The backend &lt;strong&gt;does not include CORS headers in 400/500 error responses&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;[Successful 200 response]
Access-Control-Allow-Origin: https://dev.goormgb.space ✓
→ Frontend can read the response body

[Error 500 response]
Access-Control-Allow-Origin: (absent) ✗
→ Browser blocks the response → frontend cannot determine the error cause
→ Error handling fails → blank page
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="why-cors-headers-are-needed-even-on-error-responses"&gt;Why CORS headers are needed even on error responses&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Browser security policy&lt;/strong&gt;: All responses require CORS headers when origins differ&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Error handling&lt;/strong&gt;: Without CORS headers, &lt;code&gt;response.json()&lt;/code&gt; itself fails&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;UX improvement&lt;/strong&gt;: Error messages can be surfaced to the user&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="fix-forwarded-to-backend-team"&gt;Fix (forwarded to backend team)&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-java" data-lang="java"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Spring Boot - include CORS headers in error responses too&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;@Configuration&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;class&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CorsConfig&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;implements&lt;/span&gt; WebMvcConfigurer {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;@Override&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;addCorsMappings&lt;/span&gt;(CorsRegistry registry) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; registry.&lt;span style="color:#a6e22e"&gt;addMapping&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;/**&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .&lt;span style="color:#a6e22e"&gt;allowedOrigins&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;https://dev.goormgb.space&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .&lt;span style="color:#a6e22e"&gt;allowedMethods&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;*&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; .&lt;span style="color:#a6e22e"&gt;allowCredentials&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Or add headers to all exception responses via @ControllerAdvice&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;h2 id="3-authtokenrefresh-404-error"&gt;3. /auth/token/refresh 404 Error&lt;/h2&gt;
&lt;h3 id="problem-2"&gt;Problem&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Frontend calls &lt;code&gt;/auth/token/refresh&lt;/code&gt; and receives 404&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="root-cause-path-mismatch"&gt;Root Cause: Path Mismatch&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Side&lt;/th&gt;
 &lt;th&gt;Path&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Frontend call&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;/auth/token/refresh&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;API Gateway routing&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Path=/auth/**&lt;/code&gt; → Auth-Guard&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Auth-Guard actual path&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;/token/refresh&lt;/code&gt; (no class-level @RequestMapping)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Frontend: /auth/token/refresh
 ↓
API Gateway: matches /auth/** → forwards to Auth-Guard
 ↓
Auth-Guard: receives /auth/token/refresh
 ↓
404! (only /token/refresh actually exists)
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="fix-forwarded-to-backend-team-1"&gt;Fix (forwarded to backend team)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Option 1: Add StripPrefix filter&lt;/strong&gt; (recommended)&lt;/p&gt;</description></item></channel></rss>