Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.x] Adding new stop-replication action in ism #1370

Open
wants to merge 7 commits into
base: 2.x
Choose a base branch
from

Conversation

aggarwalShivani
Copy link
Contributor

Description

Backport of #1198 on 2.x branch. Pls see the PR on main branch for complete background.
Mainly, the changes are ->

  • Imported replication utilities from common-utils project
  • Added a new action "stop-replication" for ism policies.
  • Added UTs and one integration test (with installation of ccr plugin in testClusters.integTest )

Related Issues

Resolves #726

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

opensearch-trigger-bot bot and others added 2 commits February 6, 2025 17:38
…ject#1368)

* Add release notes for v2.19



* Added release notes for v2.19.0



* Added Release Notes for v2.19.0



---------


(cherry picked from commit eb6afa8)

Signed-off-by: Vaibhav Agarwal <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Increment version to 2.19.1-SNAPSHOT

Signed-off-by: opensearch-ci-bot <[email protected]>

* Update glibc solution

Signed-off-by: Peter Zhu <[email protected]>

---------

Signed-off-by: opensearch-ci-bot <[email protected]>
Signed-off-by: Peter Zhu <[email protected]>
Co-authored-by: opensearch-ci-bot <[email protected]>
Co-authored-by: Peter Zhu <[email protected]>
@bowenlan-amzn bowenlan-amzn changed the base branch from 2.x to 2.19 February 22, 2025 05:07
@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Feb 22, 2025

@aggarwalShivani seems current stable branch is 2.19, can you click update branch button once for this PR.
2.x somehow is still on 2.19.0, trigger a PR #1380

Copy link
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with these changes. I will try to run these on my local to make sure it works and merge these in.

@@ -313,6 +319,7 @@ def jobSchedulerFile = resolvePluginFile("opensearch-job-scheduler")
def notificationsCoreFile = resolvePluginFile("opensearch-notifications-core")
def notificationsFile = resolvePluginFile("notifications")
def securityPluginFile = resolvePluginFile("opensearch-security")
def ccrFile = resolvePluginFile("opensearch-security")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def ccrFile = resolvePluginFile("opensearch-security")
def ccrFile = resolvePluginFile("opensearch-cross-cluster-replication")

Comment on lines +71 to +73
ccr_build_download = 'https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/' + opensearch_no_snapshot +
'/latest/linux/x64/tar/builds/opensearch/plugins/opensearch-cross-cluster-replication-' + ccr_no_snapshot + '.zip'
ccr_resource_folder = "src/test/resources/replication"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ccr_build_download = 'https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/' + opensearch_no_snapshot +
'/latest/linux/x64/tar/builds/opensearch/plugins/opensearch-cross-cluster-replication-' + ccr_no_snapshot + '.zip'
ccr_resource_folder = "src/test/resources/replication"

@@ -35,6 +35,7 @@ buildscript {
job_scheduler_no_snapshot = opensearch_build
notifications_no_snapshot = opensearch_build
security_no_snapshot = opensearch_build
ccr_no_snapshot = opensearch_build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ccr_no_snapshot = opensearch_build

Comment on lines +40 to +45
val replicationPluginInterface: ReplicationPluginInterface = mock()
whenever(replicationPluginInterface.stopReplication(any(), any(), any()))
.thenAnswer { invocation ->
val listener = invocation.getArgument<ActionListener<AcknowledgedResponse>>(2)
listener.onResponse(stopReplicationResponse) // Simulate a successful response
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time, can you try mock the client instead. Though there's another layer ReplicationPluginInterface in between, in the end, we still make the call using client so I hope it can work.
So we don't need to have that setter for ReplicationPluginInterface in the source code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you already tried #1198 (comment)

Let me try once then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aggarwalShivani please check this commit bowenlan-amzn@81c5626

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @bowenlan-amzn, this helps a lot 😃
I've tried the UTs this way and it works. I will verify thoroughly with these changes and push in the next commit.

waitFor {
val metadataInfo = getExplainManagedIndexMetaData(indexName).info.toString()
assertTrue(
metadataInfo.contains("cause=No replication in progress for index:" + indexName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a failure? Can we also assert sth like step status is failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats right, we're testing the negative case here.
Sure, i'll add the step status check too. Thanks!

@bowenlan-amzn
Copy link
Member

Successfully run locally, here's the related log

Details
[2025-02-21T22:57:57,065][INFO ][o.o.i.i.ManagedIndexRunner] [b0f1d87158aa] Executing attempt_stop_replication for follower-log-test-01.07.2024
[2025-02-21T22:57:57,071][INFO ][o.o.r.a.s.TransportStopIndexReplicationAction] [b0f1d87158aa] Stopping index replication on index:follower-log-test-01.07.2024
[2025-02-21T22:57:57,098][INFO ][o.o.c.m.MetadataIndexStateService] [b0f1d87158aa] closing indices [follower-log-test-01.07.2024/vHuX1LI7Tb6k00fXalDYSw]
[2025-02-21T22:57:57,170][INFO ][o.o.c.m.MetadataIndexStateService] [b0f1d87158aa] completed closing of indices [follower-log-test-01.07.2024]
[2025-02-21T22:57:57,171][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,227][INFO ][o.o.r.t.s.ShardReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024][0] opensearch[b0f1d87158aa][replication_follower][T#1]: Unable to get changes from seqNo: 1. java.util.concurrent.CancellationException: [follower-log-test-01.07.2024][0] was closed.
	at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22)
	at kotlinx.coroutines.CoroutineScopeKt.cancel(CoroutineScope.kt:295)
	at kotlinx.coroutines.CoroutineScopeKt.cancel$default(CoroutineScope.kt:295)
	at org.opensearch.replication.task.CrossClusterReplicationTask.cancelTask(CrossClusterReplicationTask.kt:88)
	at org.opensearch.replication.task.CrossClusterReplicationTask.onIndexShardClosed(CrossClusterReplicationTask.kt:163)
	at org.opensearch.replication.task.IndexCloseListener.beforeIndexShardClosed(IndexCloseListener.kt:46)
	at org.opensearch.index.CompositeIndexEventListener.beforeIndexShardClosed(CompositeIndexEventListener.java:121)
	at org.opensearch.index.IndexService.closeShard(IndexService.java:875)
	at org.opensearch.index.IndexService.removeShard(IndexService.java:859)
	at org.opensearch.index.IndexService.close(IndexService.java:573)
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1227)
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:443)
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:284)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:637)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:623)
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:589)
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:510)
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

[2025-02-21T22:57:57,228][INFO ][o.o.r.t.s.ShardReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024][0] opensearch[b0f1d87158aa][replication_follower][T#2]: Received cancellation of ShardReplicationTask java.util.concurrent.CancellationException: [follower-log-test-01.07.2024][0] was closed.
	at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22)
	at kotlinx.coroutines.CoroutineScopeKt.cancel(CoroutineScope.kt:295)
	at kotlinx.coroutines.CoroutineScopeKt.cancel$default(CoroutineScope.kt:295)
	at org.opensearch.replication.task.CrossClusterReplicationTask.cancelTask(CrossClusterReplicationTask.kt:88)
	at org.opensearch.replication.task.CrossClusterReplicationTask.onIndexShardClosed(CrossClusterReplicationTask.kt:163)
	at org.opensearch.replication.task.IndexCloseListener.beforeIndexShardClosed(IndexCloseListener.kt:46)
	at org.opensearch.index.CompositeIndexEventListener.beforeIndexShardClosed(CompositeIndexEventListener.java:121)
	at org.opensearch.index.IndexService.closeShard(IndexService.java:875)
	at org.opensearch.index.IndexService.removeShard(IndexService.java:859)
	at org.opensearch.index.IndexService.close(IndexService.java:573)
	at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:1227)
	at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:443)
	at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:284)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:637)
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:623)
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:589)
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:510)
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205)
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)

[2025-02-21T22:57:57,229][INFO ][o.o.r.t.s.ShardReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024][0] Going to mark ShardReplicationTask:136 task as completed
[2025-02-21T22:57:57,231][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,247][INFO ][o.o.r.s.RemoteClusterRetentionLeaseHelper] [b0f1d87158aa] Removed retention lease with id - replication:followerCluster:iGmzvR8aR9eVYupjsXoz6Q:[follower-log-test-01.07.2024][0]
[2025-02-21T22:57:57,310][INFO ][o.o.p.PluginsService     ] [b0f1d87158aa] PluginService:onIndexModule index:[follower-log-test-01.07.2024/vHuX1LI7Tb6k00fXalDYSw]
[2025-02-21T22:57:57,335][INFO ][o.o.p.PluginsService     ] [b0f1d87158aa] PluginService:onIndexModule index:[.tasks/92pgbRUvS5eJ3fdGYq3IaQ]
[2025-02-21T22:57:57,338][INFO ][o.o.c.m.MetadataCreateIndexService] [b0f1d87158aa] [.tasks] creating index, cause [auto(task api)], templates [], shards [1]/[1]
[2025-02-21T22:57:57,339][INFO ][o.o.c.r.a.AllocationService] [b0f1d87158aa] updating number_of_replicas to [0] for indices [.tasks]
[2025-02-21T22:57:57,342][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,400][INFO ][o.o.p.PluginsService     ] [b0f1d87158aa] PluginService:onIndexModule index:[.tasks/92pgbRUvS5eJ3fdGYq3IaQ]
[2025-02-21T22:57:57,422][INFO ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[follower-log-test-01.07.2024][0]]]).
[2025-02-21T22:57:57,625][INFO ][o.o.r.t.s.ShardReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024][0] Successfully persisted task status
[2025-02-21T22:57:57,695][INFO ][o.o.i.c.n.f.IndexOperationActionFilter] [b0f1d87158aa] Add notification action listener for tasks: TY3ZubVWTiKJH4Qo860NrA:600 and action: indices:admin/open 
[2025-02-21T22:57:57,701][INFO ][o.o.c.m.MetadataIndexStateService] [b0f1d87158aa] opening indices [[follower-log-test-01.07.2024/vHuX1LI7Tb6k00fXalDYSw]]
[2025-02-21T22:57:57,702][INFO ][o.o.p.PluginsService     ] [b0f1d87158aa] PluginService:onIndexModule index:[follower-log-test-01.07.2024/vHuX1LI7Tb6k00fXalDYSw]
[2025-02-21T22:57:57,704][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,739][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,786][INFO ][o.o.p.PluginsService     ] [b0f1d87158aa] PluginService:onIndexModule index:[follower-log-test-01.07.2024/vHuX1LI7Tb6k00fXalDYSw]
[2025-02-21T22:57:57,855][INFO ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[follower-log-test-01.07.2024][0]]]).
[2025-02-21T22:57:57,885][WARN ][o.o.c.r.a.AllocationService] [b0f1d87158aa] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2025-02-21T22:57:57,933][INFO ][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Cancelling index replication stop
[2025-02-21T22:57:57,934][INFO ][o.o.r.a.s.TransportInternalStopIndexReplicationAction] [b0f1d87158aa] Stop replication successful for index[follower-log-test-01.07.2024] with response: true
[2025-02-21T22:57:57,934][INFO ][o.o.i.i.ManagedIndexRunner] [b0f1d87158aa] Finished executing attempt_stop_replication for follower-log-test-01.07.2024
[2025-02-21T22:57:57,933][ERROR][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Exception encountered in CrossClusterReplicationTask - coroutine:isActive=false Context=[org.opensearch.replication.util.OpenSearchThreadContextElement@343a9eb7, StandaloneCoroutine{Cancelling}@8a64050, OpenSearchThreadPoolExecutor[name = b0f1d87158aa/replication_follower, org.opensearch.common.util.concurrent.OpenSearchThreadPoolExecutor@66de00f2[Running, pool size = 3, active threads = 1, queued tasks = 0, completed tasks = 3983]]]
java.util.concurrent.CancellationException: Index replication task received an interrupt.
	at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22) ~[?:?]
	at kotlinx.coroutines.CoroutineScopeKt.cancel(CoroutineScope.kt:295) ~[?:?]
	at kotlinx.coroutines.CoroutineScopeKt.cancel$default(CoroutineScope.kt:295) ~[?:?]
	at org.opensearch.replication.task.CrossClusterReplicationTask.cancelTask(CrossClusterReplicationTask.kt:88) ~[?:?]
	at org.opensearch.replication.task.index.IndexReplicationTask.clusterChanged(IndexReplicationTask.kt:938) ~[?:?]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListener(ClusterApplierService.java:662) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.cluster.service.ClusterApplierService.callClusterStateListeners(ClusterApplierService.java:648) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:606) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:510) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:205) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:955) [opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) ~[opensearch-2.19.1-SNAPSHOT.jar:2.19.1-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2025-02-21T22:57:57,934][INFO ][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Going to mark IndexReplicationTask:35 task as completed
[2025-02-21T22:57:57,935][INFO ][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Completed the task with id:35
[2025-02-21T22:57:57,936][INFO ][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Verifying task details - currentTask={isAssigned=true,executorNode=TY3ZubVWTiKJH4Qo860NrA}
[2025-02-21T22:57:57,944][INFO ][o.o.r.t.i.IndexReplicationTask] [b0f1d87158aa] [follower-log-test-01.07.2024] Successfully persisted task status

@aggarwalShivani aggarwalShivani changed the base branch from 2.19 to 2.x February 27, 2025 12:19
@aggarwalShivani
Copy link
Contributor Author

aggarwalShivani commented Feb 27, 2025

@aggarwalShivani seems current stable branch is 2.19, can you click update branch button once for this PR. 2.x somehow is still on 2.19.0, trigger a PR #1380

Hi @bowenlan-amzn
I was no longer getting the Update branch button on the PR, it showed some conflicts to be resolved.
For now, I have changed the base of the branch to be 2.x as that had latest 2.20 update. I've resolved conflicts from github UI and created a merge commit. I hope that is right? 🤔

I'm not sure why, it shows this file release-notes/opensearch-index-management.release-notes-2.19.1.0.md in my committed files, but i dont see this file in 2.x branch.
And when i pulled these changes onto my local, it says "Your branch is ahead of 'origin/2.x' by 62 commits." - I think it has got a mix of 2.19 and 2.x branches 🤔 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants