| 6.1 | Deploy Grafana Alloy + Loki | Deploy Grafana Alloy as a DaemonSet to collect container logs from every node and ship them to a Loki instance with persistent storage. | kubectl logs equivalent queries work in Grafana Explore via the Loki data source; logs from all namespaces are indexed. |
| 6.2 | Deploy Prometheus | Deploy Prometheus via Helm with service discovery for all K8s pods, persistent storage, and a 30-day retention policy. | promql queries return metrics for cluster nodes, pods, and custom application metrics; Prometheus UI is accessible internally. |
| 6.3 | Deploy Grafana | Deploy Grafana with persistent storage, configure Loki and Prometheus as data sources, and expose it at grafana.redactedworld.com behind Traefik with Keycloak OAuth2 proxy for SSO. | Grafana loads at the public URL; users log in via Keycloak; both data sources show "connected" in the data source settings. |
| 6.4 | Service metrics instrumentation | Add Prometheus client libraries to every NestJS service (api-gateway, auth, user, org, chat, forum, notification, domain, scan, report) exposing /metrics with standard HTTP, gRPC, and business metrics. | Each service's /metrics endpoint returns valid Prometheus exposition format; Prometheus scrapes all targets successfully. |
| 6.5 | Dashboards (grafana.redactedworld.com) | Create pre-built Grafana dashboards: Cluster Overview, Service Health, Scan Pipeline (active jobs, durations, failure rates), and NATS Throughput. Export dashboards as JSON and store in version control. | All four dashboards render with live data; dashboard JSON files are committed to the repository under grafana/dashboards/. |