-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort ingesters when generating token lookup table fix #6513 #6535
sort ingesters when generating token lookup table fix #6513 #6535
Conversation
Signed-off-by: Daniel Sabsay <[email protected]>
Signed-off-by: Daniel Sabsay <[email protected]>
2b87236
to
28860c7
Compare
Converting to draft because after rebasing |
Okay, repro is behaving as expected again (failed twice in a row, passed twice in a row with this fix). This is ready for review. |
Ok.. this LGTM.. This function is only called when something in the ring changes so it should not have any perf problem. |
But do we ever expect same token to be shared by multliple members of the ring? I believe the chance is pretty low, but in theory it would cause shards to be slightly uneven. |
@justinjung04 I believe the same token is shared by multiple members only when |
Use 128 tokens instead of 10,000. Use ringDesc when generating testValues and creating a ring of ingesters. Fixes: cortexproject#6513 Related: cortexproject#6535 Signed-off-by: Charlie Le <[email protected]>
Use 128 tokens instead of 10,000. Use ringDesc when generating testValues and creating a ring of ingesters. Fixes: #6513 Related: #6535 Signed-off-by: Charlie Le <[email protected]>
What this PR does:
Sort ingesters when generating token lookup table in ring to fix consistency issues.
Since map iteration order is undefined, successive calls to
getTokensInfo()
can return different results when tokens are owned by multiple instances in the ring. I don't know enough about the ring to say whether that is expected to occur, but in the tests it does and this fixes that test.See the linked issue for more details and here for a reproducible test: https://github.com/dsabsay/cortex/tree/flaky-ring-consistency-test-repro
I validated the test locally by running my reproducer (linked above) twice (failed both times), applying this patch and running the same test (passed both times). I don't think it's worth merging the reproducer as it takes 5mins to run utilizing 16 cores fully.
Which issue(s) this PR fixes:
Fixes #6513
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]