Improving Loki performance & scalability

Update 2023-02-15: These changes have now been rolled out everywhere.

In the coming days we are rolling out a significant change to the Loki setup, migrating to the “simple scalable deployment mode”. With this model, the Loki services will be split among several targets, namely: read, write and backend.

Some of our customers have experienced performance-related limitations in our Grafana Loki setup, mainly on queries that require scanning a large volume of data. Up until recently we have been running a monolith Loki architecture, so a single Loki Pod performed all the roles (ingester, querier, query-scheduler, frontend, …). And while we’ve been tweaking the setup, it is showing its limitations increasingly more often. This change allows us to more granularly scale each of the components according to a customer’s logging needs. Furthermore, it allows for large-volume queries to be split in smaller chunks that would be processed in parallel by multiple (reader) Pods. We’re also adding horizontal Pod scaling, through keda, for the read target.

Since this is a significant change to the Loki archetucture on our setups, it is possible some initial resources/scaling issues can occur once deployed on your environments. We’ve made some initial guesstimates based upon our testing and your current Loki resource configuration, however it is expected some OOMKill crashes could occur at the beginning. We invite you, once the deployments have finished, to run your most demanding Loki queries on the new setup so we can better fine-tune each environment’s requirements.

In addition to the scalable design model, we are also migrating away from DynamoDB as index backend, in favor of the Loki recommended boltdb-shipper. We’ve been using it successfully for the past 6 months with our Azure AKS clusters and starting the 1st of March Loki indices on our EKS setup will start using this too. Depending on your retention period, eventually the DynamoDB indices will be cleaned up. This migration is transparent to you and requires no action to take.

Grafana Loki has also been upgraded to it’s latest release 2.7.3.