Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] org.opensearch.index.engine.InternalEngineTests.testLastRefreshCheckpoint is flaky #9029

Closed
reta opened this issue Aug 1, 2023 · 3 comments · Fixed by #9365
Closed
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run Indexing Indexing, Bulk Indexing and anything related to indexing

Comments

@reta
Copy link
Collaborator

reta commented Aug 1, 2023

Describe the bug

The org.opensearch.index.engine.InternalEngineTests.testLastRefreshCheckpoint is flaky:

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=649, name=Thread-197, state=RUNNABLE, group=TGRP-InternalEngineTests]
	at __randomizedtesting.SeedInfo.seed([B448FCE4648CA4EB:7A3CB282E14976E1]:0)
Caused by: java.lang.AssertionError: 
Expected: a value equal to or greater than <0L>
     but: <-1L> was less than <0L>
	at __randomizedtesting.SeedInfo.seed([B448FCE4648CA4EB]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.index.engine.InternalEngineTests.lambda$testLastRefreshCheckpoint$84(InternalEngineTests.java:6643)
	at java.base/java.lang.Thread.run(Thread.java:1623)

See please: https://build.ci.opensearch.org/job/gradle-check/21580/testReport/junit/org.opensearch.index.engine/InternalEngineTests/testLastRefreshCheckpoint/

To Reproduce

./gradlew ':server:test' --tests "org.opensearch.index.engine.InternalEngineTests.testLastRefreshCheckpoint" -Dtests.seed=B448FCE4648CA4EB

Expected behavior
Test must always pass

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • CI

Additional context
The flaky assertion was introduced in #7653

CC @ankitkala

@reta reta added bug Something isn't working flaky-test Random test failure that succeeds on second run labels Aug 1, 2023
@reta reta removed the untriaged label Aug 1, 2023
@saratvemulapalli saratvemulapalli added distributed framework Indexing Indexing, Bulk Indexing and anything related to indexing labels Aug 2, 2023
@ankitkala
Copy link
Member

com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=649, name=Thread-197, state=RUNNABLE, group=TGRP-InternalEngineTests]
	at __randomizedtesting.SeedInfo.seed([B448FCE4648CA4EB:7A3CB282E14976E1]:0)
Caused by: java.lang.AssertionError: 
Expected: a value equal to or greater than <0L>
     but: <-1L> was less than <0L>
	at __randomizedtesting.SeedInfo.seed([B448FCE4648CA4EB]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:964)
	at org.junit.Assert.assertThat(Assert.java:930)
	at org.opensearch.index.engine.InternalEngineTests.lambda$testLastRefreshCheckpoint$84(InternalEngineTests.java:6643)
	at java.base/java.lang.Thread.run(Thread.java:1623)

@ankitkala
Copy link
Member

ankitkala commented Aug 3, 2023

The initialising value for currentOngoingRefreshCheckpoint might be initializing -1 in some cases. I'm trying to repro this locally but haven't got any failure yet even after 8K test iterations.

currentOngoingRefreshCheckpoint/LastRefreshedCheckpointListener.pendingCheckpoint gets initialised from localCheckpointTracker.getProcessedCheckpoint which should never be negative (code link).

@ankitkala
Copy link
Member

I was not able to reproduce this even after 50K tests iterations.

pendingCheckpoint and refreshedCheckpoint are initialized with same value. pendingCheckpoint keeps track of the the processedCheckpoint for the ongoing refresh and updates the refreshedCheckpoint with the value after refresh. Only reason this error above (pendingCheckpoint=-1 && refreshedCheckpoint=0) can happen is if the pendingCheckpoint was set to -1 during refresh whereas refreshedCheckpoint was initialized with 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework flaky-test Random test failure that succeeds on second run Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants