Improved etcd backups

We’ve upgraded all the k8s cluster with a new etcd backup implementation. The old backup solution was relying on daily snapshots taken from a service running in the master nodes.

We’ve decided to take a new approach by using AWS Data Lifecycle Manager to take daily snapshots of the 6 etcd EBS volumes (2 per instance, 3 master nodes).

This new solution guarantees a higher reliability and efficiency of the backups.

We’ve successfully tested and documented the restore procedure in order to make a possible disaster recovery quick and easy.

The default backup retention period we configured is of 14 days.