If a Kubernetes Service had no active Endpoints, for example when a deployment is scaled to 0, then requests to that Service were timing out. Instead it’s supposed to reject traffic with the appropriate ICMP response.
More …
We now offer the option to configure custom endpoints and custom routes in Alertmanager. This is useful if you want to route your prometheus alerts to custom slack endpoints or use an escalation tool like PagerDuty or OpsGenie.
More …
We now offer the option to enable and use an internal-only Nginx Ingress Controller, next to the public one we offer by default. This is useful if you want to expose services running in K8s only within the private AWS VPC network.
More …
We now offer the option to parse custom log labels with Promtail so you can use them in Loki.
More …
Today a notice for CVE-2020-11053 with a severity of High went out, impacting our oauth2-proxy that is used for authentication to our internal dashboards.
More …
We have updated the following cluster components to their latest version:
More …
We have updated our EKS control planes and nodes to the latest supported version: 1.15. This brings EKS on K8s v1.15.11.
More …
You might have noticed the NodeFilesystemSpaceFillingUp
alert passing by on some occasions. That alert triggers when Prometheus predicts that a node’s disk will run out of space, based on the trend of the last few hours.
More …
We’re updating the format of our monitoring Slack messages. As you already know, all the alerts produced by your Kubernetes clusters show up in Slack. The goal is to provide visibility on what’s going on in your infrstructure and application, and to improve the response time of alerts.
More …
We’re deprecating the Prometheus setup of our ECS clusters. We released that setup a while ago as an alternative for both infrastructure and application montoring for our ECS clusters, similar to what we have in place for Kubernetes. But we’ve found that it hasn’t been really used in any of the clusters. Operational coverage for ECS clusters is still covered by our central monitoring system, and customers have their own monitoring in place to cover the application.
More …
We have migrated all our managed cluster add-ons and our CI to Helm 3.
More …
Our AWS ElasticSearch terraform module now supports auto-configuration for multi-az deployment. The criteria is to always enable multiple Availability Zones up to 3 zones and within the available resources, unless specified otherwise by the user.
More …
In the context of the current global situation regarding the COVID-19 pandemic, we’re making it easy for us and our customers to commit part of our infrastructure spare resources to the Folding@Home project. In short, Folding@Home uses distributed computing to process large amounts of data for medical research. Amid the current crisis of the COVID-19 virus, they’ve also started generating work units (WU) towards gathering as much information as possible of the virus.
More …
We’ve rolled out some minor updates to the monitoring components.
More …
Over the past weeks we’ve rolled out a bunch of updates to our Kubernetes addons stack for all staging and production clusters.
More …
Using EBS-backed Persistent Volumes on Kubernetes comes with some caveats. Among those is the (silent) limit of maximum attachments per EC2 instance. For more information about this issue, you can check the documentation.
More …
We have upgraded the core cluster components, running in kube-system
, to their latest recommended versions (for EKS 1.14):
More …
We have deployed Calico to our EKS setups as a network policy engine.
More …
We rolled out Concourse version 5.8.1 to all our setups.
More …
We rolled out version 1.0.4 of the Caddy web server to all our setups which use on-demand “whitelabel” type of domains. All these certificates are now being requested and renewed against the ACMEv2 API.
More …