Upgraded AKS and EKS clusters to 1.22
We have started rolling out AKS and EKS 1.22. This brings our supported AKS platforms to v1.22.11
and EKS to v1.22.10
.
We have started rolling out AKS and EKS 1.22. This brings our supported AKS platforms to v1.22.11
and EKS to v1.22.10
.
As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already started rolling these out to all non-production clusters. Production upgrades are scheduled to happen next week during business hours. As usual, no workload interuptions are expected.
More …We’ve upgraded all Teleport clusters and nodes from version 8.2.0 to 9.3.7. Teleport is a tool we mostly use internally to provide secure and auditted access to (EC2) instances and Kubernetes clusters.
More …On AWS EKS clusters we use Calico for providing NetworkPolicy
functionality as an optional feature. With these NetworkPolicies
you can control the traffic flow within a Kubernetes cluster between Pods, Services and external resources.
We’re in the process of upgrading the Nginx Ingress Controller from the legacy v0.51.0
version to mainline v1.2.1
. This is in preparation for the AKS and EKS upgrades to Kubernetes 1.22 which is following in the coming weeks.
Kubernetes events are a great resource to debug and troubleshoot problems with workloads and other cluster components. The problem is that the K8s API stores them for only 1 hour. To be able to persist those events further in time, we used an open-source component called eventrouter
, which streamed all cluster events into Loki. This project has been deprecated and unmaintained for a while now, so we needed to find a replacement for it.
Some of our customers have experienced performance-related limitations in our Loki & Fluent-bit setup, mainly on queries that require scanning a large volume of data. At the moment we run a monolith Loki architecture, so the single Loki Pod performs all the roles (ingester, querier, query-scheduler, frontend, …). This setup works good enough for most of our customers, but it shows its limitations on large-volume queries, which require some brute-force and could benefit from more parallelism.
More …As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. And it’s a big one! We’ve already rolled these out to all non-production clusters. Production upgrades are scheduled to happen on Monday 16/05 during business hours. As usual, no workload interuptions are expected.
More …We are preparing to upgrade our platforms to Kubernetes 1.22, which drops support for many deprecated API versions! This is the final call to make sure you’ve updated your manifests, Helm charts (*), etc. to make use of the newest APIs (mainly Ingress, but some others too). Make sure to check out the deprecated API migration guide for more info. Below you will find more details on some of the most common deprecated resources used by our customers.
More …For a long time now, we’ve been using the AWS Node Termination Handler for catching Spot instance interruption notices, allowing Kubernetes to respond appropriately by draining these (spot) nodes before they are terminated. You may have noticed this behavior via the “:construction: Instance interruption” notices in the Slack alerts.
More …You can now create GP3 Persistent Volumes through the gp3
and gp3-encrypted
Storage Classes. This is in addition to the previously available GP2 (gp2
, gp2-encrypted
) and EFS (efs
) volumes.
We’ve completely refactored how we manage our monitoring components, like Prometheus, Grafana and many exporters and alerts. For you, the platform user, nothing will change although there is some disruption in Grafana and Prometheus expected during rollout.
More …It has come to our attention that many of our 24/7 escalation phone numbers could only receive calls from domestic numbers. We have generated new telepohone numbers, so make sure to verify your new number in your repo’s README!
We’ve rolled out an nginx-ingress update (v0.51.0
) to all clusters, with the following fixes:
Last year we rolled out a major Grafana update, going from 7.5 to 8.1, but had to roll back because a small number of customers were impacted by a breaking change regarding SQL datasources.
More …We’d like to inform you that we’re renaming and (re-tagging) many network related resources, like the VPC name, subnet names, route tables, etc. Normally this shouldn’t have any impact for you, however if you rely on the VPC name for anything, you will need to update this workload. (*)
More …We have upgraded our Concourse setups to the latest version 7.7.0. This new version brings several small features and bug fixes. You can check the full changelog in the Concourse releases page.
More …We have upgraded Istio on all clusters that use it. The version was upgraded from 1.12.0
to 1.13.2
.
All Vault setups have been updated from 1.9.0
to the latest patch version 1.9.4
.
As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already rolled these out to all non-production clusters. Production upgrades will happen on Monday 21/03 during business hours.
More …