-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Flaky test in Elasticsearch module #10866
Comments
Pinging @elastic/stack-monitoring |
Refs elastic#10866 There was a bug fixed in Go a while ago which looks very much like what's happening here. When a series of requests were made with the HTTP client, a race condition would be triggered wherein the first request would return an EOF error and subsequent requests would fail. This was a bug in the connection reuse cod This is supposedly fixed upstream, and discussed in depth here: golang/go#4677 However, this is similar enough to what we are seeing, that I disabled connection re-use in these tests to see if things improve for us. If they do not, there may be no harm in simply putting some 1s sleeps in here as an easy mitigation path if this does not resolve it.
This comment has been minimized.
This comment has been minimized.
@cachedout Some Elasticsearch test errors went away when I commented out the |
@ruflin Hmm. I have seen other issues around those tests in recent days. I, too, suspect that something funny is going on with them. |
I also couldn't exactly reproduce the |
After repeatedly running the Elasticsearch module integration test in Metricbeat, I found that sometimes Elasticsearch doesn't get enough time to perform CCR and generate CCR stats. This causes the following error, but only some times: ``` --- FAIL: TestFetch (2.44s) --- FAIL: TestFetch/ccr (0.08s) elasticsearch_integration_test.go:92: Error Trace: elasticsearch_integration_test.go:92 Error: Should NOT be empty, but was [] Test: TestFetch/ccr ``` So this PR adds a 300ms sleep to give Elasticsearch enough time to perform CCR and generate CCR stats. After testing various sleep durations, I found that 300ms seemed to be the lowest (round) value I could use that consistently passed this test. Possibly related: #10866
I did some more testing here. It appears that the |
I'm going to be debugging the ES test flakiness over here in this draft PR: #11224 |
* Add sleep to allow ES sufficient time for CCR (#11172) After repeatedly running the Elasticsearch module integration test in Metricbeat, I found that sometimes Elasticsearch doesn't get enough time to perform CCR and generate CCR stats. This causes the following error, but only some times: ``` --- FAIL: TestFetch (2.44s) --- FAIL: TestFetch/ccr (0.08s) elasticsearch_integration_test.go:92: Error Trace: elasticsearch_integration_test.go:92 Error: Should NOT be empty, but was [] Test: TestFetch/ccr ``` So this PR adds a 300ms sleep to give Elasticsearch enough time to perform CCR and generate CCR stats. After testing various sleep durations, I found that 300ms seemed to be the lowest (round) value I could use that consistently passed this test. Possibly related: #10866 * Fixing formatting
@ycombinator Since you have a PR up, I reassigned this. Please let me know if that's all right. |
@sayden The flaky tests mentioned in this issue have been un-skipped since June 4. AFAICT there haven't been any flakiness issues since then. Are you okay resolving this issue now? |
@sayden I'm closing this issue per my previous comment. If you disagree, feel free to reopen. |
…lastic#12437) * Add sleep to allow ES sufficient time for CCR (elastic#11172) After repeatedly running the Elasticsearch module integration test in Metricbeat, I found that sometimes Elasticsearch doesn't get enough time to perform CCR and generate CCR stats. This causes the following error, but only some times: ``` --- FAIL: TestFetch (2.44s) --- FAIL: TestFetch/ccr (0.08s) elasticsearch_integration_test.go:92: Error Trace: elasticsearch_integration_test.go:92 Error: Should NOT be empty, but was [] Test: TestFetch/ccr ``` So this PR adds a 300ms sleep to give Elasticsearch enough time to perform CCR and generate CCR stats. After testing various sleep durations, I found that 300ms seemed to be the lowest (round) value I could use that consistently passed this test. Possibly related: elastic#10866 * Fixing formatting
Flaky Test
Stack Trace
The text was updated successfully, but these errors were encountered: