Introducing alerts for Fluent Bit errors
Considering we’re moving more and more log processing to Fluent Bit, it’s important to get notified when logs are not making it to the storage solutions (“outputs”) like Elasticsearch, Logz.io and S3.
More …Considering we’re moving more and more log processing to Fluent Bit, it’s important to get notified when logs are not making it to the storage solutions (“outputs”) like Elasticsearch, Logz.io and S3.
More …We have upgraded our Concourse setups to the latest version 7.5.0.
More …Let’s Encrypt certificates are (usually) cross-signed with the DST Root CA X3 root certificate, however this root certificate expired on September 30th 2021.
More …Every piece of infrastructure we create is managed via Terraform. This is to ensure that everything we deploy is repeatable, follows best practices and is fully tracked.
More …On the 5th of October a notice for CVE-2021-39226 with a severity of high went out, impacting the Grafana deployments.
More …In some cases, a disaster recovery plan might require RDS snapshots to be replicated / copied over to a different AWS account and region. We can now set up this replication process for the managed RDS instances of our customers. Note that this will work in conjunction of the normal automated daily RDS snapshots that AWS already performs.
More …As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already rolled these out to all clusters.
More …We’ve seen in multiple occasions that, due to resource starvation in a cluster, the kubelet starts evicting critical infrastructure Pods. This can lead to important downtimes and disruptions in multiple occasions.
More …We’ve improved the looks and the content of the EC2 instance interruption notifications that we receive in Slack.
More …We have started rolling out AKS and EKS 1.21. This brings both our supported AKS and EKS platforms on Kubernetes v1.21.2
.
We have upgraded Istio on all clusters that use it. The version was upgraded from 1.10.0 to 1.11.2. The new version comes with some features meant for operators and no breaking changes that you should be concerned of.
More …We have muted the critical KubeAPIErrorBudgetBurn alerts.
More …During the last year we have tested out the Vertical Pod Autoscaler on several of our workloads and customers. These results were positive and therefore we decided to roll out the VPA on all our clusters.
More …Last month we rolled out a major Grafana update, going from 7.5 to 8.1. While initially everything looked in order, some customers experienced issues with their custom dashboards which were working perfectly in the previous release. Mainly data coming from SQL data sources, visualized through the “old” graph panel, is sorted differently or got visualized completely wrong.
More …We’ve upgraded Cert-manager to the version 1.4.4 on all our Kubernetes clusters. This patch upgrade contains a bug-fix for a renewal time issue that affected some of our clusters.
More …We are in the process of upgrading our Kubernetes based Vault setups to the latest version 1.8.2
.
We are in the process of upgrading our Kubernetes based Vault setups to the latest version 1.8.1
.
As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already rolled these out to all clusters.
More …In an effort to optmize as much as possible the resources being used by the infrastrucutre components running on our Reference Solution Kubernetes platforms, we’ve considerably reduced the memory used by the Cluster Autoscaler by optimizing its configuration. This means that the autoscaler will now run more reliably, and that there’ll be a bit more memory available for other workloads running on the K8s clusters.
More …We’ve upgraded all Teleport clusters to version 6.2.8. Coming from version 4.x, this is a (double) major release, coming with many new features:
More …