Bug: Shuffle sharding is not working as desired for ingesters #10531

rishabhkumar92 · 2025-01-29T01:12:00Z

What is the bug?

Hi Team,

We are currently on Grafana 2.14.3, and we were trying to decrease the tenant shard size by following these steps in runbook.

We noticed discrepancy in number of metrics as soon as we try to reduce tenant shard size for a tenant after disabling shuffle sharding (-querier.shuffle-sharding-ingesters-enabled=false). This flag was modified under querier block and was verified by sshing in the queriers pod.

Shuffle sharding is not working as desired.

How to reproduce it?

Start Grafana 2.14.3
Disable shuffle sharding so that querier can query all ingesters
After reducing tenant shard size of a certain tenant we saw reduction in number of metrics (count({__name__=~".+"})) at given point of time. Attaching screenshots for before/after modifying shard size. Shard size was modified from 41 to 16 for tenant.

What did you think would happen?

For a given point of time, no of metrics count shouldn't change as querier should be able to query all ingesters.

What was your environment?

Kubernetes

Any additional context to share?

The text was updated successfully, but these errors were encountered:

dimitarvdimitrov · 2025-02-10T16:51:31Z

Hi, can you provide some more information? Can you find the specific series which have disappeared? After that you can try to find which ingesters hosted these series by using tools/grpcurl-query-ingesters and confirm whether the series belong to any ingester?

rishabhkumar92 · 2025-02-13T00:42:49Z

I will try to get more data on series information which disappeared but we reduced shard size couple of times and every time we saw reduction in number of series on reads. Is there an easy way to verify what ingesters are getting queried for a certain query?

dimitarvdimitrov · 2025-02-17T09:27:43Z

Is there an easy way to verify what ingesters are getting queried for a certain query?

there are traces which would include all ingesters involved in a query. See this articel to configure tracing collection https://grafana.com/docs/mimir/latest/configure/configure-tracing/

rishabhkumar92 · 2025-02-20T15:55:11Z

thanks for linking that, in addition to the above option is there a metric which can tell us that shuffle sharding is enabled on read path?

dimitarvdimitrov · 2025-02-21T12:38:53Z

I don't think so. If it enabled on the write path, then it is enabled on the read path too. So, you can compare the number of ingesters with series for a tenant (count(group by (pod) (cortex_ingester_active_series{user=}))) with all ingesters (count(group by (pod) (cortex_ingester_active_series{}))). If they're not the same, then shufle-shardng is likely enabled. But there's nothing tracking how many ingesters each query touches.

rishabhkumar92 added the bug Something isn't working label Jan 29, 2025

narqo added the component/ingester label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Shuffle sharding is not working as desired for ingesters #10531

Bug: Shuffle sharding is not working as desired for ingesters #10531

rishabhkumar92 commented Jan 29, 2025

dimitarvdimitrov commented Feb 10, 2025

rishabhkumar92 commented Feb 13, 2025

dimitarvdimitrov commented Feb 17, 2025

rishabhkumar92 commented Feb 20, 2025

dimitarvdimitrov commented Feb 21, 2025

Bug: Shuffle sharding is not working as desired for ingesters #10531

Bug: Shuffle sharding is not working as desired for ingesters #10531

Comments

rishabhkumar92 commented Jan 29, 2025

What is the bug?

How to reproduce it?

What did you think would happen?

What was your environment?

Any additional context to share?

dimitarvdimitrov commented Feb 10, 2025

rishabhkumar92 commented Feb 13, 2025

dimitarvdimitrov commented Feb 17, 2025

rishabhkumar92 commented Feb 20, 2025

dimitarvdimitrov commented Feb 21, 2025