In the context of the current global situation regarding the COVID-19 pandemic, we’re making it easy for us and our customers to commit part of our infrastructure spare resources to the Folding@Home project. In short, Folding@Home uses distributed computing to process large amounts of data for medical research. Amid the current crisis of the COVID-19 virus, they’ve also started generating work units (WU) towards gathering as much information as possible of the virus.
More …
We’ve rolled out some minor updates to the monitoring components.
More …
Over the past weeks we’ve rolled out a bunch of updates to our Kubernetes addons stack for all staging and production clusters.
More …
Using EBS-backed Persistent Volumes on Kubernetes comes with some caveats. Among those is the (silent) limit of maximum attachments per EC2 instance. For more information about this issue, you can check the documentation.
More …
We have upgraded the core cluster components, running in kube-system
, to their latest recommended versions (for EKS 1.14):
More …
We have deployed Calico to our EKS setups as a network policy engine.
More …
We rolled out Concourse version 5.8.1 to all our setups.
More …
We rolled out version 1.0.4 of the Caddy web server to all our setups which use on-demand “whitelabel” type of domains. All these certificates are now being requested and renewed against the ACMEv2 API.
More …
We rolled out Concourse version 5.8.0 to all our setups.
More …
As of now we have the option to deploy Vault on our reference solution out of the box.
More …
As you may know, we define our Kubernetes clusters’ desired state in a yaml file, which is stored in the customer private Git repository. That file is then fed into our CI, which is the one responsible for rolling out the cluster.
More …
We use Velero as our solution to backup complete K8s cluster workloads (both K8s resources and Persistent Volumes).
More …
During our migrations from KOPS to EKS clusters, some customer Pods had issues launching, due to hitting fs.inotify.max_user_instances
and/or fs.inotify.max_user_watches
limits. Turns out these sysctl
have been raised from their defaults for the KOPS base images, but the EKS AMIs still use the OS defaults.
More …
We now make it possible to run (part of) your Kubernetes and/or Concourse worker nodes in public subnets, if the situation requires it. However our default is still to deploy these instances in private subnets.
More …
The Concourse team is working hard to have an implementation to accomodate feature environments in Concourse. However this is still WIP at this moment and per request of our customers we researched a way to have feature environments with Concourse.
More …
We offer Grafana Loki as default logging solution, which relies on the Promtail daemonset for gathering logs on each K8s node and shipping them to Loki.
More …
For some customers, with more complex dashboards, Grafana has recently become unstable sometimes due to hitting our configured memory limits.
More …
In our quest to automate most of the components of our infrastructure, we’ve set up CI/CD pipelines to automate the rollout of Teleport servers and their nodes.
More …
Some earlier changes in how we label our AWS AutoScaling Groups (ASGs) and which labels the Kubernetes cluster-autoscaler uses for automatically detecting these ASGs caused the scaler to not work properly. This could result in clusters not automatically removing unneeded nodes, or adding extra ones when more capacity is needed.
More …
It has come to our attention that in certain cases our Prometheus-based ElasticSearch monitoring wasn’t correctly detecting issues and sending alerts.
More …