Upgrade Vault to 1.0.1

A Vault upgrade for our setups was long overdue. We’ve upgraded our Vault installation tools from version 0.9.3 to 1.0.1, which is the latest Vault version available at the moment. As Vault is set up as HA, the downtime of the upgrade will be minimal, normally between half a second and a couple of seconds, which is the time the fail-over takes. The upgrade procedure to achieve that minimal downtime is the following:

More …

Set resource requests and limits for all infrastructure pods

We’ve recently adjusted resource requests and limits for all Pods running in the infrastructure namespace. Previously, some of them didn’t have requests nor limits, and some others had unnecessary high values. We’ve reviewed the CPU and memory usage of those Pods for the last couple of weeks and we’ve adjusted their requests and limits accordingly. This is now rolled out to all staging clusters, and we’ll proceed with the production clusters next week if no issues are spotted.

More …

Grafana Pods dashboard updated memory metrics

We’ve updated the Pods dashboard so it displays both the actual container memory usage (container_memory_working_set_bytes) next to the previous metric including caches (container_memory_usage_bytes). You can find this dashboard in your grafana deployment as Pods v2.

More …

Releasing our user-level documentation repository

Today we’re releasing a new user-level knowledge base of our products and services. It’s aimed to help you be more confident and autonomous in managing your applications on our platforms. You can find it in the following GitHub repository: https://github.com/skyscrapers/documentation

More …

Teleport upgrade to 2.7.5

Teleport has been upgraded to version 2.7.5 for all users. This upgrade includes various bugfixes and performance improvements, as well as additional functionality such as scp (secure copy) from the web interface.

More …

Vault data is now backed up

Our Vault setup is configured to store the data in a DynamoDB table, using Vault DynamoDB storage backend. DynamoDB already replicates all the data in a table across three availability zones, giving Vault high availability and data durability. From today, we’re also enabling point-in-time recovery for the DynamoDB table, which provides continuous backups of the data for the last 35 days. This will give you the possibility to restore your Vault data in case it gets deleted or corrupted by accident, or you just want to go to a previous state.

More …

Reduced number of NAT gateways

We’ve reduced the number of NAT gateways per VPC. In the previous setup we created one NAT gateway per VPC where we routed all the non-k8s traffic, and we had three NAT gateways just for the k8s cluster (one for each Availability Zone). In total we ended up having 4 NAT gateways per environment, plus one for the tools stack, so a total of 9.

More …