Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort ingesters when generating token lookup table fix #6513 #6535

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

dsabsay
Copy link
Contributor

@dsabsay dsabsay commented Jan 22, 2025

What this PR does:

Sort ingesters when generating token lookup table in ring to fix consistency issues.

Since map iteration order is undefined, successive calls to getTokensInfo() can return different results when tokens are owned by multiple instances in the ring. I don't know enough about the ring to say whether that is expected to occur, but in the tests it does and this fixes that test.

See the linked issue for more details and here for a reproducible test: https://github.com/dsabsay/cortex/tree/flaky-ring-consistency-test-repro

I validated the test locally by running my reproducer (linked above) twice (failed both times), applying this patch and running the same test (passed both times). I don't think it's worth merging the reproducer as it takes 5mins to run utilizing 16 cores fully.

Which issue(s) this PR fixes:
Fixes #6513

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Daniel Sabsay added 2 commits January 23, 2025 14:36
@dsabsay dsabsay force-pushed the flaky-ring-consistency-test-fix branch from 2b87236 to 28860c7 Compare January 23, 2025 22:36
@dsabsay dsabsay marked this pull request as draft January 23, 2025 23:09
@dsabsay
Copy link
Contributor Author

dsabsay commented Jan 23, 2025

Converting to draft because after rebasing master, the reproducer test is no longer failing. I need to test some more.

@dsabsay dsabsay marked this pull request as ready for review January 24, 2025 18:23
@dsabsay
Copy link
Contributor Author

dsabsay commented Jan 24, 2025

Okay, repro is behaving as expected again (failed twice in a row, passed twice in a row with this fix). This is ready for review.

@alanprot
Copy link
Member

Ok.. this LGTM.. This function is only called when something in the ring changes so it should not have any perf problem.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 30, 2025
@justinjung04
Copy link
Contributor

But do we ever expect same token to be shared by multliple members of the ring? I believe the chance is pretty low, but in theory it would cause shards to be slightly uneven.

@dsabsay
Copy link
Contributor Author

dsabsay commented Feb 13, 2025

@justinjung04 I believe the same token is shared by multiple members only when replication factor > num zones.

CharlieTLe added a commit to CharlieTLe/cortex that referenced this pull request Feb 13, 2025
Use 128 tokens instead of 10,000. Use ringDesc when generating
testValues and creating a ring of ingesters.

Fixes: cortexproject#6513

Related: cortexproject#6535

Signed-off-by: Charlie Le <[email protected]>
@yeya24 yeya24 merged commit a0e7e35 into cortexproject:master Feb 18, 2025
17 checks passed
alanprot pushed a commit that referenced this pull request Mar 4, 2025
Use 128 tokens instead of 10,000. Use ringDesc when generating
testValues and creating a ring of ingesters.

Fixes: #6513

Related: #6535

Signed-off-by: Charlie Le <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/ring lgtm This PR has been approved by a maintainer size/XS
Projects
None yet
4 participants