Hotfix for Grafana and InfoInhibitor alert info

There are no actions to take, and all changes have been rolled out to all environments.

Grafana datasources missing

Last Monday (28/02), we pushed the latest monitoring upgrades to production environments. Quite soon some customer noticed issues with missing Grafana datasources (eg. Prometheus and Loki), which was the result of a race condition between Grafan startup and the used datasource reloader sidecar.

Tuesday we deployed a hotfix for this issue to all clusters. As a side effect, we have disabled (for now) the ability to specify custom Grafana datasources via configmap, however this feature was not documented nor in use in any of our managed environments.

InfoInhibitor alert

Meanwhile you might have also noticed new InfoInhibitor alerts in your channels. This is a special alert, that bundles severity: info alerts together. The goal is to not get notified by this, unless other alerts with a higher severity are firing in the same namespace.

We had initially “muted” this alert, as is the intended goal, however we only applied this for infrastructure namespaces, managed by us. As a result on several environments this alert started firing for customer-managed namespaces. We have deployed an updated config to properly silence and inhibit this alert for all namespaces.

More info can be found in the InfoInhibitor’s runbook.

Extra

Additional updates deployed as part of this release: