Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SegmentReplicationIT.testSendCorruptBytesToReplica flaky test #11079

Closed
dreamer-89 opened this issue Nov 3, 2023 · 1 comment
Closed

SegmentReplicationIT.testSendCorruptBytesToReplica flaky test #11079

dreamer-89 opened this issue Nov 3, 2023 · 1 comment
Assignees
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run

Comments

@dreamer-89
Copy link
Member

https://build.ci.opensearch.org/job/gradle-check/28601/testReport/org.opensearch.indices.replication/SegmentReplicationIT/testSendCorruptBytesToReplica/

Failure trace shows shard failure due to cancellation.

[2023-10-20T18:14:14,383][WARN ][o.o.i.c.IndicesClusterStateService] [node_t2] [test-idx-1][0] marking and sending shard failed due to [failed recovery]
org.opensearch.indices.recovery.RecoveryFailedException: [test-idx-1][0]: Recovery failed from {node_t1}{3WBDS_xFTjSOSfKO7q-IHQ}{4OxgSCO2SBCfaG2P6ZC0aA}{127.0.0.1}{127.0.0.1:40373}{d}{shard_indexing_pressure_enabled=true} into {node_t2}{szE1NPrtRyaRO2qYYaAEWA}{alb_SLApSEKX9tYbACe47Q}{127.0.0.1}{127.0.0.1:45207}{d}{shard_indexing_pressure_enabled=true} ([test-idx-1][0]: Recovery failed from {node_t1}{3WBDS_xFTjSOSfKO7q-IHQ}{4OxgSCO2SBCfaG2P6ZC0aA}{127.0.0.1}{127.0.0.1:40373}{d}{shard_indexing_pressure_enabled=true} into {node_t2}{szE1NPrtRyaRO2qYYaAEWA}{alb_SLApSEKX9tYbACe47Q}{127.0.0.1}{127.0.0.1:45207}{d}{shard_indexing_pressure_enabled=true})
	at org.opensearch.indices.recovery.RecoveryTarget.notifyListener(RecoveryTarget.java:134) [main/:?]
	at org.opensearch.indices.replication.common.ReplicationTarget.fail(ReplicationTarget.java:177) [main/:?]
	at org.opensearch.indices.replication.common.ReplicationCollection.fail(ReplicationCollection.java:212) [main/:?]
	at org.opensearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.onException(PeerRecoveryTargetService.java:738) [main/:?]
	at org.opensearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.handleException(PeerRecoveryTargetService.java:668) [main/:?]
	at org.opensearch.telemetry.tracing.handler.TraceableTransportResponseHandler.handleException(TraceableTransportResponseHandler.java:81) [main/:?]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1503) [main/:?]
	at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:438) [main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.lang.Thread.run(Thread.java:1623) [?:?]
Caused by: org.opensearch.indices.recovery.RecoveryFailedException: [test-idx-1][0]: Recovery failed from {node_t1}{3WBDS_xFTjSOSfKO7q-IHQ}{4OxgSCO2SBCfaG2P6ZC0aA}{127.0.0.1}{127.0.0.1:40373}{d}{shard_indexing_pressure_enabled=true} into {node_t2}{szE1NPrtRyaRO2qYYaAEWA}{alb_SLApSEKX9tYbACe47Q}{127.0.0.1}{127.0.0.1:45207}{d}{shard_indexing_pressure_enabled=true}
	... 9 more
Caused by: org.opensearch.transport.RemoteTransportException: [node_t1][127.0.0.1:40373][internal:index/shard/recovery/start_recovery]
Caused by: org.opensearch.transport.RemoteTransportException: [node_t2][127.0.0.1:45207][internal:index/shard/replication/segments_sync]
Caused by: org.opensearch.indices.replication.common.ReplicationFailedException: Segment Replication failed
	at org.opensearch.indices.replication.SegmentReplicationTargetService$3.onFailure(SegmentReplicationTargetService.java:532) ~[main/:?]
	at org.opensearch.core.action.ActionListener$1.onFailure(ActionListener.java:90) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:104) ~[main/:?]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) ~[main/:?]
	at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:341) ~[main/:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:120) ~[main/:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:112) ~[main/:?]
	at java.util.ArrayList.forEach(ArrayList.java:1511) ~[?:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:112) ~[main/:?]
	at org.opensearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:178) ~[main/:?]
	at org.opensearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:149) ~[main/:?]
	at org.opensearch.action.StepListener.innerOnFailure(StepListener.java:84) ~[main/:?]
	at org.opensearch.core.action.NotifyOnceListener.onFailure(NotifyOnceListener.java:62) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.core.action.ActionListener$4.onFailure(ActionListener.java:192) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.core.action.ActionListener$6.onFailure(ActionListener.java:311) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.support.RetryableAction$RetryingListener.onFinalFailure(RetryableAction.java:218) ~[main/:?]
	at org.opensearch.action.support.RetryableAction$RetryingListener.onFailure(RetryableAction.java:210) ~[main/:?]
	at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) ~[main/:?]
	at org.opensearch.telemetry.tracing.handler.TraceableTransportResponseHandler.handleException(TraceableTransportResponseHandler.java:81) ~[main/:?]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1503) ~[main/:?]
	at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:438) ~[main/:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) ~[main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1623) ~[?:?]
Caused by: org.opensearch.transport.RemoteTransportException: [node_t1][127.0.0.1:40373][internal:index/shard/replication/get_segment_files]
Caused by: org.opensearch.common.util.CancellableThreads$ExecutionCancelledException: retryable action was cancelled
	at org.opensearch.indices.recovery.RetryableTransportClient.cancel(RetryableTransportClient.java:116) ~[main/:?]
	at org.opensearch.indices.replication.RemoteSegmentFileChunkWriter.cancel(RemoteSegmentFileChunkWriter.java:128) ~[main/:?]
	at org.opensearch.indices.replication.SegmentReplicationSourceHandler.cancel(SegmentReplicationSourceHandler.java:193) ~[main/:?]
	at org.opensearch.indices.replication.OngoingSegmentReplications.cancel(OngoingSegmentReplications.java:183) ~[main/:?]
	at org.opensearch.indices.replication.OngoingSegmentReplications.cancelHandlers(OngoingSegmentReplications.java:278) ~[main/:?]
	at org.opensearch.indices.replication.OngoingSegmentReplications.cancel(OngoingSegmentReplications.java:171) ~[main/:?]
	at org.opensearch.indices.replication.SegmentReplicationSourceService.beforeIndexShardClosed(SegmentReplicationSourceService.java:225) ~[main/:?]
	at org.opensearch.indices.replication.SegmentReplicationIT.testCancellation(SegmentReplicationIT.java:592) ~[internalClusterTest/:?]
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) ~[junit-4.13.2.jar:4.13.2]
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) ~[junit-4.13.2.jar:4.13.2]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894) ~[randomizedtesting-runner-2.7.1.jar:?]
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) ~[lucene-test-framework-9.8.0.jar:9.8.0 d914b3722bd5b8ef31ccf7e8ddc638a87fd648db - 2023-09-21 21:57:47]
	at org.junit.rules.RunRules.evaluate(RunRules.java:20) ~[junit-4.13.2.jar:4.13.2]
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) ~[randomizedtesting-runner-2.7.1.jar:?]
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) ~[randomizedtesting-runner-2.7.1.jar:?]
	at java.lang.Thread.run(Thread.java:1623) ~[?:?]
@dreamer-89 dreamer-89 added bug Something isn't working untriaged flaky-test Random test failure that succeeds on second run labels Nov 3, 2023
@Poojita-Raj
Copy link
Contributor

This failure was from a gradle check that ran before the fix was merged in on Oct 25th. We can reopen if we see any further failures.
Fix: #10897
There was one additional failure post Oct 25 - but was on a 2.11 branch (failure) that's fixed by this backport to 2.x: #10923

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
None yet
Development

No branches or pull requests

4 participants