Use encrypted EBS volumes for etcd storage and (optionally) encrypt k8s node root volumes
We’re rolling out a major update for our Kubernetes etcd clusters to now use encrypted EBS volumes for storing all of the Kubernetes state.
More …We’re rolling out a major update for our Kubernetes etcd clusters to now use encrypted EBS volumes for storing all of the Kubernetes state.
More …We’re updating our Kubernetes staging clusters with CoreDNS, the new dns server that replaces KubeDNS. After an in-depth analysis and tests we’ve verified that the performance and the stability between the two solutions are almost identical. Here you can find more details on why we decided to move to CoreDNS.
More …A Vault upgrade for our setups was long overdue. We’ve upgraded our Vault installation tools from version 0.9.3 to 1.0.1, which is the latest Vault version available at the moment. As Vault is set up as HA, the downtime of the upgrade will be minimal, normally between half a second and a couple of seconds, which is the time the fail-over takes. The upgrade procedure to achieve that minimal downtime is the following:
More …Update: Changed Kubernetes update from 1.10.12 to 1.11.6
More …Update 2 (2018-12-03): Since our last update, the people at Kubernetes updated their documentation to add an important fix in the 1.10.11 changelog:
More …We’ve deployed on all our ECS managed staging clusters a prometheus monitoring system.
More …Following our efforts to improve the overall stability of our Kubernetes clusters, we’ve now set resource reservations for kubelet and other system processes. This will ensure that these critical processes always have enough CPU and memory available to function properly, regardless of what the actual cluster workloads are.
More …As announced in our previous update, we have migrated our cluster-monitoring
stack by using the new stable/prometheus-operator
as base chart. By now these updates have already been rolled out across staging clusters.
Our cluster monitoring stack is based on the prometheus-operator
developed by the people at CoreOS, more concretely we used kube-prometheus
as a starting point for a complete setup.
We’re moving the Letsencrypt service on our Kubernetes from the deprecated kube-lego
to cert-manager
.
We’ve recently adjusted resource requests and limits for all Pods running in the infrastructure namespace. Previously, some of them didn’t have requests nor limits, and some others had unnecessary high values. We’ve reviewed the CPU and memory usage of those Pods for the last couple of weeks and we’ve adjusted their requests and limits accordingly. This is now rolled out to all staging clusters, and we’ll proceed with the production clusters next week if no issues are spotted.
More …We’ve updated the Pods dashboard so it displays both the actual container memory usage (container_memory_working_set_bytes
) next to the previous metric including caches (container_memory_usage_bytes
). You can find this dashboard in your grafana deployment as Pods v2.
Today we’re releasing a new user-level knowledge base of our products and services. It’s aimed to help you be more confident and autonomous in managing your applications on our platforms. You can find it in the following GitHub repository: https://github.com/skyscrapers/documentation
More …We upgraded and tested our test cluster successfully to Debian stretch now that all open issues are resolved.
More …Teleport has been upgraded to version 2.7.5 for all users. This upgrade includes various bugfixes and performance improvements, as well as additional functionality such as scp (secure copy) from the web interface.
More …Today we release the addition of the Kubernetes Cluster AutoScaler to our clusters. Since we’ll be enabling the autoscaler by default, we’ll be initially deploying it on staging while production clusters will follow in a couple of days.
More …Our Vault setup is configured to store the data in a DynamoDB table, using Vault DynamoDB storage backend. DynamoDB already replicates all the data in a table across three availability zones, giving Vault high availability and data durability. From today, we’re also enabling point-in-time recovery for the DynamoDB table, which provides continuous backups of the data for the last 35 days. This will give you the possibility to restore your Vault data in case it gets deleted or corrupted by accident, or you just want to go to a previous state.
More …We have updated our internals infrastructure tools to the latest version. These upgrades add bugfixes and several new features.
More …We’ve reduced the number of NAT gateways per VPC. In the previous setup we created one NAT gateway per VPC where we routed all the non-k8s traffic, and we had three NAT gateways just for the k8s cluster (one for each Availability Zone). In total we ended up having 4 NAT gateways per environment, plus one for the tools stack, so a total of 9.
More …We use Elasticsearch with Kibana to aggregate logs from Kubernetes and our customers’ applications. Today this stack got upgraded to 6.3
, bringing several improvements and bug fixes.