Previously we shipped your logs with Fluentd to CloudWatch Logs and optionally send them to an ElasticSearch/Kibana cluster (“EFK” stack) for analytics. This setup however was expensive, had quite some problems scaling and was overkill for most of our customers anyway. Due to that we researched for alternatives with the following requirements:
- simple setup with low maintenance overhead
- we can provide proper guidance and advice to our customers how to use it
- benefits need to outweigh the cost
During KubeCon earlier this year we got in touch with Grafana Loki and after an initial POC this seemed a good candidate to offer for our reference solution.
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.
We configured Loki so chunk data is saved on S3 and the index data is stored in DynamoDB. This setup is significally cheaper than the Fluentd/Cloudwatch and/or Elasticsearch solution and integrates nicely with Grafana.
Unlike other logging systems, Loki is built around the idea of only indexing metadata about your logs: labels (just like Prometheus labels). Log data itself is then compressed and stored in chunks in object stores such as S3 or GCS, or even locally on the filesystem. A small index and highly compressed chunks simplifies the operation and significantly lowers the cost of Loki.
As of today we deployed Grafana Loki to all out K8s clusters. We will keep the CloudWatch Logs way enabled for now until we have verified that everything is working as expected and we will coordinate with you when we can turn this off.