-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RW Separation] Search replica recovery flow breaks when search shard allocated to new node after node drop #17334
Comments
@vinaykpud This will impact cases where a SR node is restarted, can you check our recovery logic to see if thats the case? Ie. we still want to diff local segments and ensure we only fetch wahts required to recover. |
Added integ test for reproducing this : d89c1cf |
@mch2 Yes. If the SR node restarts, we should consider loading the available local files instead of starting with an empty directory. In this case, the node restart causes the search replica to become unassigned. When the node comes back up, the allocator attempts to reassign the search replica to the same node. Since the shard was previously assigned to this node, it should already have the necessary files. Therefore, if any local files exist, we should load them. |
Following is the method that executes when allocating a shard to a node:
We need to handle two scenarios here. Assume we created an index with 1 primary (P) and 1 Search Replica (S), which were assigned as follows:
Scenario 1: Node3 leaves the cluster and joins backIf Node3 leaves the cluster and later rejoins, and we attempt to allocate the shard using the The existing comment in the code states:
I didn’t fully understand the intent behind this, but it seems to be handling a case where there are leftover/dangling files that need cleanup. Proposed ApproachTo handle this case, I suggest modifying this logic to load the files even when the strategy is Additionally, we can simplify the method using the Strategy pattern to remove the existing complex and confusing I’m confident in this change because
Since @mch2 Could you confirm if this approach makes sense? Scenario 2: Node3 leaves permanently, and Node4 is addedIn this case, the shard is allocated with the |
The behavior of Unassigned Shards differs for Primary and Replica shards.
So thats why this logic breaks : OpenSearch/server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java Line 443 in 91a93da
|
I would prefer not to update the meaning of EMPTY_STORE. EMPTY_STORE should mean just that and start fresh. Can we update the recoverySource to be existing store and iff the shard copy does not exist on the node we fall back to empty? |
Is your feature request related to a problem? Please describe
Context:
Created a 5 node cluster and created an index with 1P and 1 Search replica.
Now following is the shard assignment:
simulate a node drop, Since I am running the cluster locally using docker, I stopped the node3.
After 1 min,
AllocationService
will try to allocate the search shard to node4 and it will fail with bellow exceptionDescribe the solution you'd like
This is happens because in
ShardRouting
OpenSearch/server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java
Line 443 in d0a65d3
moveToUnassigned
is called, we set therecoverySource
asExistingStoreRecoverySource
for the search replica.Since this scenario involves in recovering the shard in another node, it wont have any files in the local store for recovery and fails with exception. So solution is when its search replica always use
EmptyStoreRecoverySource
.Related component
Search:Performance
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: