Defaulting to capacity-optimized for Spot nodepools

Most of our EKS clusters leverage Spot instances as a cost-efficient way to provide compute nodes. Historically we’ve been defaulting to the “lowest price” allocation strategy to maximize possible cost savings. However this can lead to quite some more interuptions than we want to tolerate and often a big inbalance between AZ spread if price pressure increases. Therefore we’ve updated our default to use a “capacity optimized” strategy instead for increased stability with (possibly) a marginal higher cost.

More …

Upgraded EKS cluster add-ons

As part of our regular upgrade cycle, the following EKS cluster components have been updated. We’ve already rolled these out to all non-production clusters. Production upgrades are scheduled to happen in the next few days during business hours. As usual, no workload interuptions are expected.

More …

Istio upgraded to version 1.16.3

We have upgraded Istio on all clusters that use it. The version was upgraded from 1.16.1 to 1.16.3. These releases contain bug fixes to improve robustness and security fixes in the underlying Go packages. You can check the full release notes here. We’ve also upgraded Kiali to the latest version, 1.64.0 (changelog).

More …

Post Mortem - Loki log loss

After deploying our Grafana Loki refactor, several issues started popping up, cascading to a loss of logs for a maximum of 12 hours on 22/02 or 23/02. All environments using Loki as main logging provider were affected. Environments logging to other systems like CloudWatch Logs and ElasticSearch were not affected.

More …

Upgraded Teleport to version 12.0.2

We’ve upgraded all Teleport clusters from version 11.1.1 to 12.0.2. Teleport is a tool we mostly use internally to provide secure and auditted access to (EC2) instances, Kubernetes clusters and several dashboards. The nodes will gradually be upgraded to the new version when new instances are launched.

More …

Vault upgraded to 1.12.2

All Vault setups have been updated from 1.12.0 to the latest version 1.12.2. This release brings small improvements and bug fixes. Please refer to the upstream changelogs to see what’s changed:

More …

Upgraded Teleport to version 11.1.1 for security fix

We’ve upgraded all Teleport clusters from version 11.0.3 to 11.1.1. This upgrade was done on all Teleport servers to fix a potential vulnerabilty:

Fixed issue where an attacker with physical access to user’s computer and raw access to the filesystem could potentially recover the seed QR code.

More …

Upgraded Teleport to version 11.0.3

We’ve upgraded all Teleport clusters from version 10.1.4 to 11.0.3. Teleport is a tool we mostly use internally to provide secure and auditted access to (EC2) instances, Kubernetes clusters and several dashboards. The nodes will gradually be upgraded to the new version when new instances are launched.

More …