Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Shuffle sharding is not working as desired for ingesters #10531

Open
rishabhkumar92 opened this issue Jan 29, 2025 · 5 comments
Open

Bug: Shuffle sharding is not working as desired for ingesters #10531

rishabhkumar92 opened this issue Jan 29, 2025 · 5 comments
Labels
bug Something isn't working component/ingester

Comments

@rishabhkumar92
Copy link

What is the bug?

Hi Team,

We are currently on Grafana 2.14.3, and we were trying to decrease the tenant shard size by following these steps in runbook.

We noticed discrepancy in number of metrics as soon as we try to reduce tenant shard size for a tenant after disabling shuffle sharding (-querier.shuffle-sharding-ingesters-enabled=false). This flag was modified under querier block and was verified by sshing in the queriers pod.

Shuffle sharding is not working as desired.

How to reproduce it?

  1. Start Grafana 2.14.3
  2. Disable shuffle sharding so that querier can query all ingesters
  3. After reducing tenant shard size of a certain tenant we saw reduction in number of metrics (count({__name__=~".+"})) at given point of time. Attaching screenshots for before/after modifying shard size. Shard size was modified from 41 to 16 for tenant.

What did you think would happen?

For a given point of time, no of metrics count shouldn't change as querier should be able to query all ingesters.

What was your environment?

Kubernetes

Any additional context to share?

Image
Image

@rishabhkumar92 rishabhkumar92 added the bug Something isn't working label Jan 29, 2025
@dimitarvdimitrov
Copy link
Contributor

Hi, can you provide some more information? Can you find the specific series which have disappeared? After that you can try to find which ingesters hosted these series by using tools/grpcurl-query-ingesters and confirm whether the series belong to any ingester?

@rishabhkumar92
Copy link
Author

I will try to get more data on series information which disappeared but we reduced shard size couple of times and every time we saw reduction in number of series on reads. Is there an easy way to verify what ingesters are getting queried for a certain query?

@dimitarvdimitrov
Copy link
Contributor

Is there an easy way to verify what ingesters are getting queried for a certain query?

there are traces which would include all ingesters involved in a query. See this articel to configure tracing collection https://grafana.com/docs/mimir/latest/configure/configure-tracing/

@rishabhkumar92
Copy link
Author

thanks for linking that, in addition to the above option is there a metric which can tell us that shuffle sharding is enabled on read path?

@dimitarvdimitrov
Copy link
Contributor

I don't think so. If it enabled on the write path, then it is enabled on the read path too. So, you can compare the number of ingesters with series for a tenant (count(group by (pod) (cortex_ingester_active_series{user=}))) with all ingesters (count(group by (pod) (cortex_ingester_active_series{}))). If they're not the same, then shufle-shardng is likely enabled. But there's nothing tracking how many ingesters each query touches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/ingester
Projects
None yet
Development

No branches or pull requests

3 participants