Bugfix - loki-promtail wasn't scheduled on tainted nodes
We offer Grafana Loki as default logging solution, which relies on the Promtail daemonset for gathering logs on each K8s node and shipping them to Loki.
More …We offer Grafana Loki as default logging solution, which relies on the Promtail daemonset for gathering logs on each K8s node and shipping them to Loki.
More …For some customers, with more complex dashboards, Grafana has recently become unstable sometimes due to hitting our configured memory limits.
More …In our quest to automate most of the components of our infrastructure, we’ve set up CI/CD pipelines to automate the rollout of Teleport servers and their nodes.
More …Some earlier changes in how we label our AWS AutoScaling Groups (ASGs) and which labels the Kubernetes cluster-autoscaler uses for automatically detecting these ASGs caused the scaler to not work properly. This could result in clusters not automatically removing unneeded nodes, or adding extra ones when more capacity is needed.
More …It has come to our attention that in certain cases our Prometheus-based ElasticSearch monitoring wasn’t correctly detecting issues and sending alerts.
More …During the coming days, we’ll roll out Concourse version 5.7.2 to all our setups.
More …We have updated the alert routing from k8s-spot-termination-handler to notify in our shared Slack channel to increase visibility. We’ve rolled this change out to all our clusters during the last couple of days.
More …To complement today’s barrage of changelog updates, here’s some miscellaneous additions that didn’t make it in onther post 😁:
More …We’ve made several improvements to our Kubernetes stacks, allowing us to deploy in different AWS Regions (eg. us-east-1
) and allowing more dynamic usage of the Availability Zones in those regions.
In the past weeks we’ve revisited the resource reservations (requests
and limits
) we made for running all the cluster Add-ons.
In the past week we’ve rolled out a bunch of updates to our Kubernetes cluster-monitoring stack.
More …Previously we used Project Calico as networking plugin (CNI) on all our Kubernetes clusters (KOPS & EKS). However with our move to EKS as base for our reference solution, we will be defaulting to AWS’ own VPC CNI.
More …Previously we shipped your logs with Fluentd to CloudWatch Logs and optionally send them to an ElasticSearch/Kibana cluster (“EFK” stack) for analytics. This setup however was expensive, had quite some problems scaling and was overkill for most of our customers anyway. Due to that we researched for alternatives with the following requirements:
More …On tuesday a notice for CVE-2019-14287 affecting Sudo versions prior to 1.8.28.
More …We’ve recently upgraded our Vault setups to version 1.2.3, which is the latest Vault version available at the moment. Compared to version 1.0.1, there are a bunch of bug fixes and multiple improvements under the hood. You can check the full changelog here.
More …Yesterday a notice for CVE-2019-11253 with a severity of High went out, impacting all versions of Kubernetes.
More …Terraform is an automation tool that allows you to define infrastructure as code, and we use it to manage most of our customer’s infrastructure. In order to get to that point, we’ve developed a lot of Terraform code during the last few years, along with some Terraform modules, that can be easily reused for multiple projects and use cases.
More …During today, we’ll roll out Concourse version 5.5.3 to all our setups.
More …We manage multiple Kubernetes clusters and regularly set up new ones from scratch. There are also a bunch of extra components deployed on each cluster, that we also need to maintain and keep up to date.
More …During the coming days, we’ll roll out Concourse version 5.5.1 to all our setups.
More …