Upgraded cluster add-ons
The following updates have been rolled out to all non-production clusters. As usual there are also improvements across various other add-ons, ensuring enhanced performance and security:
More …The following updates have been rolled out to all non-production clusters. As usual there are also improvements across various other add-ons, ensuring enhanced performance and security:
More …We’re choosing to share this incident publicly, although not without hesitation. Being transparent about security issues is uncomfortable and feels scary. But we believe it’s important to follow our own values. Trust is built not by hiding mistakes, but by owning them, learning from them, and showing our commitment to doing better. We communicated a more detailed version to our existing customers and we are working with them on any concerns they may have. We hope this openness reinforces our dedication to security, accountability, and the long-term trust of our customers and community.
More …We have finished deploying our announced changes to the default indexed labels and structured metadata we assign to Loki logs. Please read on to know what changes have been made and how this impacts you. Please reach out if you have any questions, need guidance or if you would like extra, custom labels to be indexed.
More …We’re rolling out a small new feature which cretes a file called ${cluster-name}-platform-info.yaml
in the k8s-clusters
directory of our customer git repositories. This file contains some basic, non-sensitive, information about the platform, such as networking information, EKS cluster information and so forth. This file is intended to be used in the future as an alternative to Terragrunt dependencies and outputs, decoupling application stacks from the core platform modules. In essence this allows us to run Terragrunt stacks without requiring direct access to the core platform modules. It also provides a single source you can use to get information to use in your application stacks, such as the VPC ID, EKS cluster name, and so forth.
Last week we rolled out a fix for a bug in our VPC route table association logic. This bug could cause some EKS worker node subnets to not be associated with the correct route table for its respective Availability Zone (AZ). This could lead to non-optimal Internet traffic routing over the wrong NAT gateway, with some increased latency and unnecessary AZ-transfer costs as result.
On 21/05 Grafana disclosed a high severity security vulnerability, identified as CVE-2025-4123. We want to inform you that last week we rolled out mitigations for this vulnerability to all our managed clusters. In the end we don’t believe our particular deployments were affected as we have no anonymous access allowed, or vulnerable plugins installed. However, to be sure, we deployed the latest version of Grafana with the fix included as well as enabling a Content Security Policy.
In this post, we’re excited to share that we have successfully completed the migration of our Kubernetes (K8s) add-ons as announced in our previous post. The management of these add-ons has now transitioned from our existing OpenTofu-based approach (using Terragrunt and Concourse CI) to Flux. Here is what you need to know about this migration, the benefits it brings, and what to expect moving forward.
More …The following updates have been rolled out to all clusters. As usual there’s improvements across various add-ons, ensuring enhanced performance and security. There’s possibly some actions required on your side regarding Grafana, so please read the entire post.
More …We intend to make breaking changes to the Loki labels we assign to logs collected from your clusters. These changes aim to improve Loki’s performance by reducing high-cardinality labels and removing duplicates (e.g. namespace
removed in favor of namespace_name
). If you rely on any of the labels we will no longer include automatically, your queries may need to be updated. We plan to roll out this change in the week of 2025-06-02. Your feedback is important to us, please let us know if you have any questions or concerns about this change.
The following updates have been rolled out to all non-production clusters. As usual there’s also improvements across various other add-ons, ensuring enhanced performance and security. There’s no notable major updates.
More …Concourse CI has been upgraded to version 7.13.1. This release only updates the bundled resource-types, specifically the s3 and registry-image resources. Both resources had bugs related to their upgrade to v2 of the AWS Go SDK.
More …Concourse CI has been upgraded to version 7.13.0. This update includes the following changes:
More …Last night Kubernetes sent out a Security Advisory regarding multiple vulnerabilities in the Nginx Ingress Controller, including the critical CVE-2025-1974. If you’re an AWS account owner, you likely also received an email from AWS warning against these vulnerabilities. We are happy to announce we have taken immediate action and upgraded all clusters to the latest version of the Nginx Ingress Controller, mitigating these vulnerabilities.
More …We are excited to announce that we have added two new controllers to our Flux setup: the Flux Image reflector and the Flux Image automation controller. These work together to update a Git repository when new container images are available.
More …Update 2025-03-24: These changes have been rolled out to all clusters.
More …In this post, we’re excited to announce our plans to shift the management of Kubernetes (K8s) add-ons from our existing OpenTofu-based approach (using Terragrunt and Concourse CI) to Flux. This change is part of our broader effort to simplify our platform managemt, improve reliability, and enhance visibility for both our internal teams and our customers. Below, we’ll cover what’s happening, why we’re making these changes, how you’ll benefit, and what to expect during and after the migration.
More …Since our previous announcement regarding the new DockerHub rate limits starting March 1st, we have updated and made sure that all system workloads managed by Skyscrapers, like ingress controllers, monitoring, istio etc., will not be affected by these new rate limits. We ensured all these components are using different mirrors1. If you haven’t taken action regarding your own application worklaods yet, we encourage you to do follow the steps outlined in our documentation (updated) and check your specific GitHub issue with more details. Our colleagues are available to guide you through this process.
Grafana Labs images (Grafana, Loki) still use DockerHub as repository. Since these images are from a “Verified Publisher”, they are not subject to the rate limits. ↩
Starting March 1, 2025 DockerHub will limit unauthenticated image pulls further from 100 per 6-hours per IP address to 10 per hour per IP address: https://docs.docker.com/docker-hub/usage/. This is a reduction of 40%! Please read this post carefully to understand what it is about and to determine whether you need to take action.
More …Concourse CI has been upgraded to version 7.12.1. This update brings some small improvements. Relevant Concourse CI changelog: https://github.com/concourse/concourse/releases/tag/v7.12.1
We are rolling out EKS v1.32
. Please make sure to update to our recommended client versions matching this upgrade. This upgrade includes an important change in how PersistentVolumeClaims (PVCs) are handled by StatefulSets. When a StatefulSet is deleted, the PVCs created by the StatefulSet will now be automatically removed too. This is a change in behavior from previous versions of Kubernetes.