Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.remotemigration.RemoteMigrationIndexMetadataUpdateIT.testIndexSettingsUpdatedOnlyForMigratingIndex is flaky #14141

Closed
himshikha opened this issue Jun 10, 2024 · 2 comments
Assignees
Labels
bug Something isn't working Storage:Remote untriaged

Comments

@himshikha
Copy link
Contributor

Describe the bug

Test case org.opensearch.remotemigration.RemoteMigrationIndexMetadataUpdateIT.testIndexSettingsUpdatedOnlyForMigratingIndex is flaky

java.lang.AssertionError: expected:<8> but was:<9>
	at __randomizedtesting.SeedInfo.seed([46E8231A5E55D2AC:4E8F0231CB23B60F]:0)
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at org.opensearch.remotemigration.RemoteMigrationIndexMetadataUpdateIT.testIndexSettingsUpdatedOnlyForMigratingIndex(RemoteMigrationIndexMetadataUpdateIT.java:262)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at org.opensearch.test.OpenSearchTestClusterRule$1.evaluate(OpenSearchTestClusterRule.java:369)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Related component

Storage:Remote

To Reproduce

./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotemigration.RemoteMigrationIndexMetadataUpdateIT.testIndexSettingsUpdatedOnlyForMigratingIndex" -Dtests.seed=46E8231A5E55D2AC

Expected behavior

Test should pass everytime

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@shourya035
Copy link
Member

Checking on the logs from the Jenkins build, it seems like the initalMetadataVersion was captured before the replica shard assignment complete.

[2024-06-10T20:33:50,569][INFO ][o.o.r.RemoteMigrationIndexMetadataUpdateIT] [testIndexSettingsUpdatedOnlyForMigratingIndex] ---> Asserting remote index settings
[2024-06-10T20:33:50,570][INFO ][o.o.r.RemoteMigrationIndexMetadataUpdateIT] [testIndexSettingsUpdatedOnlyForMigratingIndex] ---> Creating a new index on remote enabled nodes
[2024-06-10T20:33:50,589][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:50,591][INFO ][o.o.c.m.MetadataCreateIndexService] [node_t0] [remote-index] creating index, cause [api], templates [], shards [1]/[1]
[2024-06-10T20:33:50,591][WARN ][o.o.c.r.a.AllocationService] [node_t0] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2024-06-10T20:33:50,593][INFO ][o.o.c.r.a.IndexMetadataUpdater] [node_t0] Adding remote store custom data for index [remote-index] during migration
[2024-06-10T20:33:50,652][INFO ][o.o.p.PluginsService     ] [node_t3] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:50,980][INFO ][o.o.i.t.RemoteFsTranslog ] [node_t3] [remote-index][0] Downloaded data from remote translog till maxSeqNo = -1
[2024-06-10T20:33:51,101][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,101][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,102][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,102][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=RECOVERING engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,262][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,262][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=MockInternalEngine recoverySourceType=EMPTY_STORE primary=true
[2024-06-10T20:33:51,263][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t3] [remote-index][0] Scheduled retry with didRefresh=true
[2024-06-10T20:33:51,476][WARN ][o.o.c.r.a.AllocationService] [node_t0] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2024-06-10T20:33:51,531][INFO ][o.o.p.PluginsService     ] [node_t4] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:51,657][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t4] [migration-index][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=NRTReplicationEngine recoverySourceType=PEER primary=false
[2024-06-10T20:33:51,658][INFO ][o.o.i.s.RemoteStoreRefreshListener] [node_t4] [migration-index][0] Skipped syncing segments with primaryMode=false indexShardState=STARTED engineType=NRTReplicationEngine recoverySourceType=PEER primary=false
[2024-06-10T20:33:51,681][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:51,683][INFO ][o.o.c.m.MetadataMappingService] [node_t0] [remote-index/5CdHnotdQbeA1hacPqPpJg] create_mapping
[2024-06-10T20:33:51,887][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:51,896][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:51,899][INFO ][o.o.c.m.MetadataMappingService] [node_t0] [remote-index/5CdHnotdQbeA1hacPqPpJg] update_mapping [_doc]
[2024-06-10T20:33:52,042][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:52,045][INFO ][o.o.c.m.MetadataMappingService] [node_t0] [remote-index/5CdHnotdQbeA1hacPqPpJg] update_mapping [_doc]
[2024-06-10T20:33:52,087][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:52,090][INFO ][o.o.c.m.MetadataMappingService] [node_t0] [remote-index/5CdHnotdQbeA1hacPqPpJg] update_mapping [_doc]
[2024-06-10T20:33:52,181][INFO ][o.o.p.PluginsService     ] [node_t0] PluginService:onIndexModule index:[remote-index/5CdHnotdQbeA1hacPqPpJg]
[2024-06-10T20:33:52,185][INFO ][o.o.c.m.MetadataMappingService] [node_t0] [remote-index/5CdHnotdQbeA1hacPqPpJg] update_mapping [_doc]
[2024-06-10T20:33:52,202][INFO ][o.o.i.r.RecoverySourceHandler] [node_t3] [remote-index][0][recover to node_t4] finalizing recovery took [90.4ms]
[2024-06-10T20:33:52,354][INFO ][o.o.c.r.a.AllocationService] [node_t0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[remote-index][0]]]).
[2024-06-10T20:33:52,513][WARN ][o.o.c.r.a.AllocationService] [node_t0] Falling back to single shard assignment since batch mode disable or multiple custom allocators set
[2024-06-10T20:33:52,928][INFO ][o.o.r.RemoteMigrationIndexMetadataUpdateIT] [testIndexSettingsUpdatedOnlyForMigratingIndex] ---> Asserting remote index settings
[2024-06-10T20:33:52,929][INFO ][o.o.r.RemoteMigrationIndexMetadataUpdateIT] [testIndexSettingsUpdatedOnlyForMigratingIndex] ---> Assert metadata version is not changed

The replica shard assignment completion would have bumped the metadata version from 8 to 9, in turn tripping the assertion. Would need to ensure that the index is green before capturing the initial indexMetadata version.

Will raise a PR for this soon

@shourya035 shourya035 moved this from 🆕 New to 🏗 In progress in Storage Project Board Jun 13, 2024
@gbbafna
Copy link
Collaborator

gbbafna commented Jun 27, 2024

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9

Closing as we are now moving to autocuts .

@gbbafna gbbafna closed this as completed Jun 27, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Storage Project Board Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Remote untriaged
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants