Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retriable Error "KafkaJSRequestTimeoutError" was thrown uncaught during a rebalance #1410

Closed
ModernTrollfare opened this issue Jul 8, 2022 · 1 comment

Comments

@ModernTrollfare
Copy link

Describe the bug
A "KafkaJSRequestTimeoutError" was thrown, uncaught in our user App. It was marked "Retriable: true", but seemingly no auto-retry was carried out.

To Reproduce
There are no clear steps to reproduce this, but the following is observed:

  1. A rebalance was triggered. Our Topic has exactly 1 Consumer per group as each consumer was assigned a unique Consumer Group ID during KafkaJS instance creation, and they have exactly 1 partition. Thus, we assume that a Heartbeat Timeout is triggered.
  2. KafkaJS reported error "This is not the correct coordinator for this group".
  3. KafkaJS reported warn "The group is rebalanceing, re-joining".
  4. After a few seconds, the error is thrown:
    KafkaJSRequestTimeoutError: Request SyncGroup(key: 14, version: 3) timed out.

The timeouts used in KafkaJS:

connectionTimeout: 5000
requestTimeout: 10000
maxRetryTime: 30000
retries: 3

restartOnFailure is default (returning true)

Expected behavior
The library retries connection and re-joins indefinitely without throwing an uncaught error.

Observed behavior
An uncaught error marked "retriable: true" was thrown, and crashes the application.
Logs:
2022-07-08T01:13:07.923Z ERROR [Connection] Response OffsetCommit(key: 8, version: 5) (meta:{"level":"error","logger":"kafkajs","broker":"<broker>:9092","clientId":"<clientID>","error":"This is not the correct coordinator for this group","correlationId":14179,"size":78})
2022-07-08T01:13:07.925Z WARN [Runner] The group is rebalancing, re-joining (meta: {"level":"warn","logger":"kafkajs","groupId":"<unique groupID>","memberId":"<memberID>","error":"This is not the correct coordinator for this group"})
Error thrown:
KafkaJSRequestTimeoutError: Request SyncGroup(key: 14, version:3) timed out at SocketRequest.timeoutRequest (node_modules/kafkajs/src/network/requestQueue/socketRequest.js:107:19) at node_modules/kafkajs/src/network/requestQueue/index.js:93:21 at Map.forEach (<anonymous>) at Timeout._onTimeout (node_modules/kafkajs/src/network/requestQueue/index.js:91:23) at listOnTimeout (node:internal/timers:568:17) at processTimers (node:internal/timers:510:7) { retriable: true, helpUrl: undefined, broker: <broker>, correlationId: 3, createdAt: <epoch timestamp>, sentAt: <epoch timestamp>, pendingDuration: 0 }

Environment:

  • KafkaJS v2.0.1
  • Ubuntu 21
  • nodejs v18.2
  • Kafka is hosted on Confluent cloud.
  • One single broker specified in application.
@ModernTrollfare
Copy link
Author

Seems to be resolved with the fix mentioned in #1382 and #1384.

The current observation on v2.1.0:

  • Retrier correctly catches a throw during reconnect attempts and fails with a line of log instead of crashing the whole app.
  • The 'crash' during retry is catchable through instrumentation event interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant