Upgrade to Kubernetes 1.11.9 [CVE-2019-1002100, CVE-2019-9946, CVE-2019-3874, CVE-2019-1002101]
We are in the process of upgrading our managed Kubernetes clusters from v1.11.6
to v1.11.9
.
We are in the process of upgrading our managed Kubernetes clusters from v1.11.6
to v1.11.9
.
We’ve made the AWS Service Operator available for deployment on our managed Kubernetes clusters.
More …Update (18-03-2019): We found out there were enough default alerts covering all cases of cronjob failures. The following alerts are covering different failure cases accordingly:
More …We are in the process of upgrading our staging Kubernetes clusters components to the latest stable releases. Production clusters will follow in 1 to 2 weeks (to be announced) after we have confirmed there are no issues with our customer’s workloads.
More …We have updated the format of the monitoring Slack notifications. You might have already noticed that the monitoring messages in your Slack channels now contain more useful information and are more structured. We’ve already started rolling out the changes in staging clusters and we’ll start rolling them in production clusters during this week.
More …We have updated the clusters to have support for mongodb monitoring, alerts and dashboards. If you have a mongodb cluster you will see that there is now a mongodb dashboard in Grafana and that we added specific alert rules for mongodb in prometheus.
We’ve upgraded all the k8s cluster with a new etcd backup implementation. The old backup solution was relying on daily snapshots taken from a service running in the master nodes.
More …Update: Added other affected services next to Kubernetes.
More …We’re rolling out a major update for our Kubernetes etcd clusters to now use encrypted EBS volumes for storing all of the Kubernetes state.
More …We’re updating our Kubernetes staging clusters with CoreDNS, the new dns server that replaces KubeDNS. After an in-depth analysis and tests we’ve verified that the performance and the stability between the two solutions are almost identical. Here you can find more details on why we decided to move to CoreDNS.
More …A Vault upgrade for our setups was long overdue. We’ve upgraded our Vault installation tools from version 0.9.3 to 1.0.1, which is the latest Vault version available at the moment. As Vault is set up as HA, the downtime of the upgrade will be minimal, normally between half a second and a couple of seconds, which is the time the fail-over takes. The upgrade procedure to achieve that minimal downtime is the following:
More …Update: Changed Kubernetes update from 1.10.12 to 1.11.6
More …Update 2 (2018-12-03): Since our last update, the people at Kubernetes updated their documentation to add an important fix in the 1.10.11 changelog:
More …We’ve deployed on all our ECS managed staging clusters a prometheus monitoring system.
More …Following our efforts to improve the overall stability of our Kubernetes clusters, we’ve now set resource reservations for kubelet and other system processes. This will ensure that these critical processes always have enough CPU and memory available to function properly, regardless of what the actual cluster workloads are.
More …As announced in our previous update, we have migrated our cluster-monitoring
stack by using the new stable/prometheus-operator
as base chart. By now these updates have already been rolled out across staging clusters.
Our cluster monitoring stack is based on the prometheus-operator
developed by the people at CoreOS, more concretely we used kube-prometheus
as a starting point for a complete setup.
We’re moving the Letsencrypt service on our Kubernetes from the deprecated kube-lego
to cert-manager
.
We’ve recently adjusted resource requests and limits for all Pods running in the infrastructure namespace. Previously, some of them didn’t have requests nor limits, and some others had unnecessary high values. We’ve reviewed the CPU and memory usage of those Pods for the last couple of weeks and we’ve adjusted their requests and limits accordingly. This is now rolled out to all staging clusters, and we’ll proceed with the production clusters next week if no issues are spotted.
More …We’ve updated the Pods dashboard so it displays both the actual container memory usage (container_memory_working_set_bytes
) next to the previous metric including caches (container_memory_usage_bytes
). You can find this dashboard in your grafana deployment as Pods v2.