New feature: Jaeger tracing

Today we’re adding a new feature in our Kubernetes AWS reference solution. It’s now possible to deploy one or more Jaeger setups on your EKS clusters. AKS clusters will follow in the near future, depending on customer demand.

More …

Upgraded Teleport to version 10.1.4

We’ve upgraded all Teleport clusters and nodes from version 9.3.7 to 10.1.4. Teleport is a tool we mostly use internally to provide secure and auditted access to (EC2) instances, Kubernetes clusters and several dashboards.

More …

Upgraded cluster add-ons

As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already rolled these out to all non-production clusters in the past days. Production upgrades are scheduled to happen next week during business hours. As usual, no workload interuptions are expected.

More …

Upgraded cluster add-ons

As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. We’ve already started rolling these out to all non-production clusters. Production upgrades are scheduled to happen next week during business hours. As usual, no workload interuptions are expected.

More …

Upgraded Teleport to version 9.3.7

We’ve upgraded all Teleport clusters and nodes from version 8.2.0 to 9.3.7. Teleport is a tool we mostly use internally to provide secure and auditted access to (EC2) instances and Kubernetes clusters.

More …

Replacing eventrouter component for persisting K8s events

Kubernetes events are a great resource to debug and troubleshoot problems with workloads and other cluster components. The problem is that the K8s API stores them for only 1 hour. To be able to persist those events further in time, we used an open-source component called eventrouter, which streamed all cluster events into Loki. This project has been deprecated and unmaintained for a while now, so we needed to find a replacement for it.

More …

Improving Loki performance and usability

Some of our customers have experienced performance-related limitations in our Loki & Fluent-bit setup, mainly on queries that require scanning a large volume of data. At the moment we run a monolith Loki architecture, so the single Loki Pod performs all the roles (ingester, querier, query-scheduler, frontend, …). This setup works good enough for most of our customers, but it shows its limitations on large-volume queries, which require some brute-force and could benefit from more parallelism.

More …

Upgraded cluster add-ons

As part of our regular upgrade cycle, the following Kubernetes cluster components have been updated. And it’s a big one! We’ve already rolled these out to all non-production clusters. Production upgrades are scheduled to happen on Monday 16/05 during business hours. As usual, no workload interuptions are expected.

More …

[Action required] Final call - Deprecated API removal, upgrade your Ingresses etc.

We are preparing to upgrade our platforms to Kubernetes 1.22, which drops support for many deprecated API versions! This is the final call to make sure you’ve updated your manifests, Helm charts (*), etc. to make use of the newest APIs (mainly Ingress, but some others too). Make sure to check out the deprecated API migration guide for more info. Below you will find more details on some of the most common deprecated resources used by our customers.

More …

Internal refactor of monitoring addons

We’ve completely refactored how we manage our monitoring components, like Prometheus, Grafana and many exporters and alerts. For you, the platform user, nothing will change although there is some disruption in Grafana and Prometheus expected during rollout.

More …

[Important] New 24/7 escalation phone numbers!

It has come to our attention that many of our 24/7 escalation phone numbers could only receive calls from domestic numbers. We have generated new telepohone numbers, so make sure to verify your new number in your repo’s README!