Dedicated system node pool + reduced system component footprint

In order to improve our services we changed the way the Kubernetes nodepools are structured. Previously there was a default nodepool that had a mix of both Kubernetes add-ons and application deployments. This made things more complex than it needed to be. Therefore we created a dedicated system nodepool where all add-ons are scheduled on. During this change we also took a closer look at the requested resources for all add-ons and made adjustments where needed. For most of our customer environment we’ve been able to reduce the cluster size with at least 1 equivalent node. A handful are break-even for now, but we have further optimizations planned as follow-ups.

System nodepool

Now all system add-ons are running in a dedicated nodepool.

Benefits of this are:

clear and transparant base cost of the platform
better utilisation of EC2 instances
easier capacity planning for system and application workloads as the requirements for each might be different
easier for Skyscrapers to roll out maintenance updates without affecting application workloads
increased workload isolation

Evaluation of add-on resources and scaling options

Historically we increased resources on some components because it needed more memory. When adding new add-ons to our reference solution sometimes we defaulted to the upstream recommendations in order to guarantee stability. When we updated add-ons we don’t always re-evaluate whether the resource usage dropped or not.

In combination with the rollout of the system nodepool we are also rolling out the revisited resource requests. This has a big impact on the overall resource reservation of the cluster.

An example:

The overall CPU reservation dropped from 55% to 32% and the Memory from 51% to 39%. This allowed us to go from a 3xm5.xlarge cluster to a 3xm5.large cluster and therefore halving our operational cost.

Cluster usage before optimisations:

ip-10-12-166-51.eu-west-1.compute.internal  cpu    █████████████████████████░░░░░░░░░░  72% (38 pods) m5.xlarge - - Ready
                                            memory █████████████████████████░░░░░░░░░░  73%
ip-10-12-147-220.eu-west-1.compute.internal cpu    ████████████████████░░░░░░░░░░░░░░░  56% (30 pods) m5.xlarge - - Ready
                                            memory ███████████░░░░░░░░░░░░░░░░░░░░░░░░  32%
ip-10-12-135-162.eu-west-1.compute.internal cpu    █████████████░░░░░░░░░░░░░░░░░░░░░░  36% (16 pods) m5.xlarge - - Ready
                                            memory █████████████████░░░░░░░░░░░░░░░░░░  48%

Cluster usage after optimisations:

ip-10-12-147-220.eu-west-1.compute.internal cpu    ██████████████░░░░░░░░░░░░░░░░░░░░░  41% (43 pods) m5.xlarge - - Ready
                                            memory █████████████░░░░░░░░░░░░░░░░░░░░░░  36%
ip-10-12-135-162.eu-west-1.compute.internal cpu    █████████████░░░░░░░░░░░░░░░░░░░░░░  36% (28 pods) m5.xlarge - - Ready
                                            memory █████████████████░░░░░░░░░░░░░░░░░░  47%
ip-10-12-168-169.eu-west-1.compute.internal cpu    ██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  17% (14 pods) m5.xlarge - - Ready
                                            memory ████████████░░░░░░░░░░░░░░░░░░░░░░░  35%

Skyscrapers Changelog

System nodepool

Evaluation of add-on resources and scaling options

Related Posts

Upgraded cluster add-ons 06 Aug 2025

[ACTION REQUIRED] Bitnami deprecates free container support 31 Jul 2025

Upgraded cluster add-ons 04 Jul 2025