Environment-Specific Monitoring Dashboard Separation

Problem#

Monitoring dashboards were broken because the infrastructure setup differs between the Dev (kubeadm) and Staging (EKS) environments.

Infrastructure differences by environment#

Component	Dev (kubeadm)	Staging (EKS)
PostgreSQL	Local Pod (postgres:16-alpine)	AWS RDS
Redis	Local Pod (redis:7-alpine)	AWS ElastiCache
Metrics collection	Prometheus exporter	CloudWatch
Data source	Prometheus	CloudWatch

Symptoms#

“RDS PostgreSQL Monitoring” dashboard in Dev → No data
“ElastiCache Redis Monitoring” dashboard in Dev → No data
Cause: no CloudWatch data source (not an AWS environment)

Fix#

1. Separate dashboards by environment#

Created two versions of the same dashboard:

Dashboard	Data source	Environment
`database-rds-postgresql.json`	CloudWatch	Staging
`database-postgresql-local.json`	Prometheus	Dev
`cache-elasticache-redis.json`	CloudWatch	Staging
`cache-redis-local.json`	Prometheus	Dev

2. Per-environment exclude configuration#

Exclude dashboards that don’t apply to each environment in the Helm values:

Dev environment (dev/values/monitoring/values-prometheus-stack.yaml):

dashboards:
  devTeam:
    exclude:
      # Exclude CloudWatch-based dashboards (Dev uses local exporters)
      - "database-rds-postgresql.json"
      - "cache-elasticache-redis.json"

Staging environment (staging/values/values-prometheus-stack.yaml):

dashboards:
  devTeam:
    exclude:
      # Exclude local exporter dashboards (EKS uses CloudWatch)
      - "database-postgresql-local.json"
      - "cache-redis-local.json"

3. Update ConfigMap template#

Modified the dashboard ConfigMap generation to respect the exclude list:

{{- $dashboardFiles := .Files.Glob "files/grafana-dashboards/dev-team-observability/*.json" -}}
{{- $excludeList := .Values.dashboards.devTeam.exclude | default list -}}
{{- if gt (len $dashboardFiles) 0 }}
...
{{- range $path, $_ := $dashboardFiles }}
{{- $filename := base $path -}}
{{- if not (has $filename $excludeList) }}
  {{ $filename }}: |-
{{ $.Files.Get $path | nindent 4 }}
{{- end }}
{{- end }}
{{- end }}

Metric Comparison#

PostgreSQL#

Metric	Dev (postgres_exporter)	Staging (CloudWatch)
CPU	`process_cpu_seconds_total`	`CPUUtilization`
Connections	`pg_stat_activity_count`	`DatabaseConnections`
Cache hit	`pg_stat_database_blks_hit`	N/A (RDS-managed)
DB size	`pg_database_size_bytes`	`FreeStorageSpace` (inverse)

Redis#

Metric	Dev (redis_exporter)	Staging (CloudWatch)
CPU	`redis_cpu_sys_seconds_total`	`CPUUtilization`, `EngineCPUUtilization`
Memory	`redis_memory_used_bytes`	`BytesUsedForCache`
Connections	`redis_connected_clients`	`CurrConnections`
Cache hit	`redis_keyspace_hits_total`	`CacheHits`

File Changes#

Added files#

common-charts/.../dev-team-observability/database-postgresql-local.json
common-charts/.../dev-team-observability/cache-redis-local.json

Modified files#

common-charts/.../templates/grafana-dashboards-dev-team-observability-configmap.yaml
- Added exclude logic
common-charts/.../values.yaml
- Added dashboards.devTeam.exclude default value
dev/values/monitoring/values-prometheus-stack.yaml
- Exclude CloudWatch dashboards
staging/values/values-prometheus-stack.yaml
- Exclude local exporter dashboards

Symptoms#

istio_requests_total metrics were not being scraped by Prometheus in Staging.

Cause#

Prometheus was configured to only select PodMonitors/ServiceMonitors with the label release: kube-prometheus-stack, but the Istio PodMonitor had release: prometheus.

Fix#

Unified all monitor labels to release: kube-prometheus-stack:

# Before
labels:
  release: prometheus

# After
labels:
  release: kube-prometheus-stack

Modified files:

common-charts/infra/istio/istiod/templates/podmonitor-envoy.yaml
common-charts/infra/istio/istiod/templates/servicemonitor.yaml
common-charts/infra/argocd/templates/servicemonitor.yaml

Dev vs Staging Prometheus configuration difference#

Setting	Dev	Staging
`serviceMonitorSelectorNilUsesHelmValues`	`false` (collect all monitors)	default (label-filtered)
`podMonitorSelectorNilUsesHelmValues`	`false` (collect all monitors)	default (label-filtered)

Dev collects all monitors regardless of labels. Staging filters by label, so labels must match.

Verification#

# 1. Check Prometheus targets
kubectl exec -n monitoring sts/prometheus-prom -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/targets' | grep envoy

# 2. Query Istio metrics
kubectl exec -n monitoring sts/prometheus-prom -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=istio_requests_total'

# 3. Check dashboard ConfigMap
kubectl get configmap grafana-dashboards-dev-team-observability -n monitoring -o yaml | grep -E "^\s+\w+\.json"

References#

Prometheus postgres_exporter: prometheuscommunity/postgres-exporter:v0.15.0
Prometheus redis_exporter: oliver006/redis_exporter:v1.66.0
CloudWatch metric namespaces:
- RDS: AWS/RDS
- ElastiCache: AWS/ElastiCache

Environment-Specific Monitoring Dashboard Separation

Problem#

Infrastructure differences by environment#

Symptoms#

Fix#

1. Separate dashboards by environment#

2. Per-environment exclude configuration#

3. Update ConfigMap template#

Metric Comparison#

PostgreSQL#

Redis#

File Changes#

Added files#

Modified files#

Related Issue: Prometheus PodMonitor Label Mismatch#

Symptoms#

Cause#

Fix#

Dev vs Staging Prometheus configuration difference#

Verification#

References#