Fixed regression in Elasticsearch monitoring for Prometheus

We discovered an issue with our Elasticsearch monitoring for Prometheus that was introduced a while back in a rutinary chart upgrade. Because of this problem some Elasticsearch metrics were not being reported into Prometheus, like available storage space for example, and as a result there were some problematic situations in an Elasticsearch cluster that we didn’t pick up in time.

More …

Velero S3 backups replication

Our reference solution eks-based Velero backups on AWS S3 now supports automatic replication to an additional S3 bucket on an AWS region of choice. The feature is disabled by default, contact your lead engineer to discuss about enabling it for your cluster(s) if needed.

Monitoring upgrades

As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. These updates are being rolled out to all clusters and will be finished by the end of the week.

More …

Grafana main dashboard updated

We have fixed a problem in our main Grafana dashboard. Previously we counted the resources for all pods (including those who already completed). This gave an incorrect indication on the cluster usage. Now we filter out the failed and succeeded pods so the dashboard indicates a more correct usage of the cluster.

More …

New log shipper added - Fluent Bit

We already offer a cost-effective and default logging solution based on Grafana Loki. However we realize this logging solution is not a perfect fit for everybody and thus now also allow deploying and configuring Fluent Bit through our reference solution offering, including all the benefits like regular updates.

More …

Fixed issue with Vault certificate renewal

For increased security, our Vault setups are configured to terminate TLS sessions directly at the Vault server process. To do so, we use cert-manager to provision LetsEncrypt certificates that the Vault server Pods can use. There was an issue with this setup, where the Vault servers didn’t reload the certificate when this was renewed by cert-manager, rendering Vault insecure / unavailable.

More …