Internal refactor of monitoring addons

We’ve completely refactored how we manage our monitoring components, like Prometheus, Grafana and many exporters and alerts. For you, the platform user, nothing will change although there is some disruption in Grafana and Prometheus expected during rollout.

Non-production setups have already been rolled out, production will follow on Monday 11/04 during the day.

This change will allow us easier updates and maintenance to the monitoring system.

What did we change under the hood

Originally we used a cluster-monitoring wrapper chart around the upstream kube-prometheus-stack project. This wrapper chart bundled several of our component specifc alerting and Grafana Dashboards and was in it’s turn deployed through our Terraform stacks.

These Terraform stacks are for the largest part responsible for the setup and maintenance of our environments. Throughout the years we’ve been adding monitoring-related deploys directly together with their components (eg. fluent-bit). The cluster-monitoring chart was still the odd duck, being maintained separately, which caused extra overhead during changes and updates.

Now the kube-prometheus-stack, and all our own custom resources, are merged directly in our Terraform stacks.