Environment-Specific Monitoring Dashboard Separation
Problem#
Monitoring dashboards were broken because the infrastructure setup differs between the Dev (kubeadm) and Staging (EKS) environments.
Infrastructure differences by environment#
| Component | Dev (kubeadm) | Staging (EKS) |
|---|---|---|
| PostgreSQL | Local Pod (postgres:16-alpine) | AWS RDS |
| Redis | Local Pod (redis:7-alpine) | AWS ElastiCache |
| Metrics collection | Prometheus exporter | CloudWatch |
| Data source | Prometheus | CloudWatch |
Symptoms#
- “RDS PostgreSQL Monitoring” dashboard in Dev → No data
- “ElastiCache Redis Monitoring” dashboard in Dev → No data
- Cause: no CloudWatch data source (not an AWS environment)
Fix#
1. Separate dashboards by environment#
Created two versions of the same dashboard:
| Dashboard | Data source | Environment |
|---|---|---|
database-rds-postgresql.json | CloudWatch | Staging |
database-postgresql-local.json | Prometheus | Dev |
cache-elasticache-redis.json | CloudWatch | Staging |
cache-redis-local.json | Prometheus | Dev |
2. Per-environment exclude configuration#
Exclude dashboards that don’t apply to each environment in the Helm values:
Dev environment (dev/values/monitoring/values-prometheus-stack.yaml):
dashboards:
devTeam:
exclude:
# Exclude CloudWatch-based dashboards (Dev uses local exporters)
- "database-rds-postgresql.json"
- "cache-elasticache-redis.json"
Staging environment (staging/values/values-prometheus-stack.yaml):
dashboards:
devTeam:
exclude:
# Exclude local exporter dashboards (EKS uses CloudWatch)
- "database-postgresql-local.json"
- "cache-redis-local.json"
3. Update ConfigMap template#
Modified the dashboard ConfigMap generation to respect the exclude list:
{{- $dashboardFiles := .Files.Glob "files/grafana-dashboards/dev-team-observability/*.json" -}}
{{- $excludeList := .Values.dashboards.devTeam.exclude | default list -}}
{{- if gt (len $dashboardFiles) 0 }}
...
{{- range $path, $_ := $dashboardFiles }}
{{- $filename := base $path -}}
{{- if not (has $filename $excludeList) }}
{{ $filename }}: |-
{{ $.Files.Get $path | nindent 4 }}
{{- end }}
{{- end }}
{{- end }}
Metric Comparison#
PostgreSQL#
| Metric | Dev (postgres_exporter) | Staging (CloudWatch) |
|---|---|---|
| CPU | process_cpu_seconds_total | CPUUtilization |
| Connections | pg_stat_activity_count | DatabaseConnections |
| Cache hit | pg_stat_database_blks_hit | N/A (RDS-managed) |
| DB size | pg_database_size_bytes | FreeStorageSpace (inverse) |
Redis#
| Metric | Dev (redis_exporter) | Staging (CloudWatch) |
|---|---|---|
| CPU | redis_cpu_sys_seconds_total | CPUUtilization, EngineCPUUtilization |
| Memory | redis_memory_used_bytes | BytesUsedForCache |
| Connections | redis_connected_clients | CurrConnections |
| Cache hit | redis_keyspace_hits_total | CacheHits |
File Changes#
Added files#
common-charts/.../dev-team-observability/database-postgresql-local.jsoncommon-charts/.../dev-team-observability/cache-redis-local.json
Modified files#
common-charts/.../templates/grafana-dashboards-dev-team-observability-configmap.yaml- Added exclude logic
common-charts/.../values.yaml- Added
dashboards.devTeam.excludedefault value
- Added
dev/values/monitoring/values-prometheus-stack.yaml- Exclude CloudWatch dashboards
staging/values/values-prometheus-stack.yaml- Exclude local exporter dashboards
Related Issue: Prometheus PodMonitor Label Mismatch#
Symptoms#
istio_requests_total metrics were not being scraped by Prometheus in Staging.
Cause#
Prometheus was configured to only select PodMonitors/ServiceMonitors with the label release: kube-prometheus-stack, but the Istio PodMonitor had release: prometheus.
Fix#
Unified all monitor labels to release: kube-prometheus-stack:
# Before
labels:
release: prometheus
# After
labels:
release: kube-prometheus-stack
Modified files:
common-charts/infra/istio/istiod/templates/podmonitor-envoy.yamlcommon-charts/infra/istio/istiod/templates/servicemonitor.yamlcommon-charts/infra/argocd/templates/servicemonitor.yaml
Dev vs Staging Prometheus configuration difference#
| Setting | Dev | Staging |
|---|---|---|
serviceMonitorSelectorNilUsesHelmValues | false (collect all monitors) | default (label-filtered) |
podMonitorSelectorNilUsesHelmValues | false (collect all monitors) | default (label-filtered) |
Dev collects all monitors regardless of labels. Staging filters by label, so labels must match.
Verification#
# 1. Check Prometheus targets
kubectl exec -n monitoring sts/prometheus-prom -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/targets' | grep envoy
# 2. Query Istio metrics
kubectl exec -n monitoring sts/prometheus-prom -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=istio_requests_total'
# 3. Check dashboard ConfigMap
kubectl get configmap grafana-dashboards-dev-team-observability -n monitoring -o yaml | grep -E "^\s+\w+\.json"
References#
- Prometheus postgres_exporter:
prometheuscommunity/postgres-exporter:v0.15.0 - Prometheus redis_exporter:
oliver006/redis_exporter:v1.66.0 - CloudWatch metric namespaces:
- RDS:
AWS/RDS - ElastiCache:
AWS/ElastiCache
- RDS: