Kubernetes Service Selector & Endpoints Troubleshooting
Problem#
- API requests returning 503 Service Unavailable
- Istio logs showing:
no_healthy_upstream - Pods are in Running state but service routing is broken
Root Cause#
Mismatch between Service selector labels and Pod labels
# Check endpoints - <none> means something is wrong!
kubectl -n staging-webs get endpoints api-gateway
NAME ENDPOINTS AGE
api-gateway <none> 34h # ← no Pod IPs!
How It Works#
┌─────────────────────────────────────────────────────────┐
│ Service (api-gateway) │
│ selector: │
│ app: staging-api-gateway │
│ app.kubernetes.io/part-of: staging-webs ← required! │
└─────────────────────────────────────────────────────────┘
│
▼ find Pods with matching selector labels
┌─────────────────────────────────────────────────────────┐
│ Endpoints (auto-generated) │
│ addresses: │
│ - 10.0.20.100 (Pod1 IP) │
│ - 10.0.20.101 (Pod2 IP) │
└─────────────────────────────────────────────────────────┘
│
▼ load balance traffic
┌─────────────────────────────────────────────────────────┐
│ Pod (must have labels that match the selector) │
│ labels: │
│ app: staging-api-gateway ✓ │
│ app.kubernetes.io/part-of: staging-webs ✓ │
└─────────────────────────────────────────────────────────┘
Key point: All labels in the Service selector must be present on the Pod for it to be registered in Endpoints
Diagnosis#
1. Check Endpoints#
kubectl -n <namespace> get endpoints <service-name>
# Normal: IPs present
NAME ENDPOINTS AGE
api-gateway 10.0.20.100:8085 34h
# Problem: <none>
NAME ENDPOINTS AGE
api-gateway <none> 34h
2. Check Service Selector#
kubectl -n <namespace> get svc <service-name> -o yaml | grep -A 10 "selector:"
3. Check Pod Labels#
kubectl -n <namespace> get pod <pod-name> --show-labels
4. Compare Labels#
# Service selector
selector:
app: staging-api-gateway
app.kubernetes.io/component: backend
app.kubernetes.io/instance: staging-api-gateway
app.kubernetes.io/name: java-service
app.kubernetes.io/part-of: staging-webs # ← check if this exists!
# Pod labels
app=staging-api-gateway ✓
app.kubernetes.io/component=backend ✓
app.kubernetes.io/instance=staging-api-gateway ✓
app.kubernetes.io/name=java-service ✓
app.kubernetes.io/part-of=??? ✗ ← missing = match failure!
Fix#
Option 1: Redeploy app (ArgoCD sync)#
# ArgoCD CLI
argocd app sync <app-name>
# Trigger sync via kubectl
kubectl -n argocd patch application <app-name> \
--type merge \
-p '{"operation":{"initiatedBy":{"username":"admin"},"sync":{}}}'
Option 2: Manually add label (temporary fix)#
kubectl -n <namespace> label pod <pod-name> app.kubernetes.io/part-of=staging-webs
Option 3: Patch the Deployment labels#
kubectl -n <namespace> patch deployment <deployment-name> \
--type merge \
-p '{"spec":{"template":{"metadata":{"labels":{"app.kubernetes.io/part-of":"staging-webs"}}}}}'
Prevention#
Keep selectorLabels consistent in Helm charts
- Service selector and Deployment pod template labels must match exactly
selectorLabels are immutable
- Best to set them once and not change them
- If changed, both the Service and Deployment must be updated
Be careful with ArgoCD
- After modifying a chart, sync all related apps
- Don’t leave apps in OutOfSync state
Quick Diagnostic Commands#
# Full diagnosis in one go
NS=staging-webs
SVC=api-gateway
echo "=== Endpoints ==="
kubectl -n $NS get endpoints $SVC
echo "=== Service Selector ==="
kubectl -n $NS get svc $SVC -o jsonpath='{.spec.selector}' | jq .
echo "=== Pod Labels ==="
kubectl -n $NS get pods -l app=$SVC --show-labels
echo "=== Istio logs (recent errors) ==="
kubectl -n istio-system logs -l app=istio-ingressgateway --tail=10 | grep "no_healthy"
References#
- Kubernetes Service: https://kubernetes.io/docs/concepts/services-networking/service/
- Label Selectors: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/