Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker with Kerberos: some produce calls took 1 minute to finish #2587

Closed
1 of 7 tasks
hqthao-ts opened this issue Oct 24, 2019 · 2 comments
Closed
1 of 7 tasks

Broker with Kerberos: some produce calls took 1 minute to finish #2587

hqthao-ts opened this issue Oct 24, 2019 · 2 comments

Comments

@hqthao-ts
Copy link

hqthao-ts commented Oct 24, 2019

I try to connect the producer to Kafka brokers with SASL. My Kafka cluster has 4 brokers and every topic has 10 partitions. This is the symptom:

  • The first produce call is slow and it costs exactly 1 minute.
  • Other messages are fast, but sometimes it will slow again (and costs exactly 1 minute).

The log I printed out at the time query is slowed shows that the client periodically tries to get the metadata from brokers because the reason "leader is down: re-query". It just happens when I try to produce messages.

Below is the detail log data:

%7|1571828967.830|QRYLEADER|rdkafka#producer-1| [thrd:main]: Topic sample-topic [0]: leader is down: re-query
%7|1571828967.830|QRYLEADER|rdkafka#producer-1| [thrd:main]: Topic sample-topic [4]: leader is down: re-query
%7|1571828967.830|QRYLEADER|rdkafka#producer-1| [thrd:main]: Topic sample-topic [8]: leader is down: re-query
%7|1571828967.830|METADATA|rdkafka#producer-1| [thrd:main]: Requesting metadata for 1/1 topics: refresh unavailable topics
%7|1571828967.830|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Request metadata for 1 topic(s): refresh unavailable topics
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: ===== Received metadata (for 1 requested topics): refresh unavailable topics =====
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: ClusterId: n_pK0xZ-R_27YY-WgGNKEQ, ControllerId: 1006
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: 4 brokers, 1 topics
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Broker #0/4: broker02:6667 NodeId 1007
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Broker #1/4: broker04:6667 NodeId 1006
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Broker #2/4: broker03:6667 NodeId 1005
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Broker #3/4: broker01:6667 NodeId 1008
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: Topic #0/1: sample-topic with 10 partitions
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 0 Leader 1005
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 1 Leader 1006
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 2 Leader 1007
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 3 Leader 1008
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 4 Leader 1005
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 5 Leader 1006
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 6 Leader 1007
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 7 Leader 1008
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 8 Leader 1005
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: Topic sample-topic partition 9 Leader 1006
%7|1571828967.832|METADATA|rdkafka#producer-1| [thrd:main]: sasl_plaintext://broker02:6667/1007: 1/1 requested topic(s) sen in metadata

Based on this symptom I guess:

  • There is some misconfigurations on brokers or the client.
  • This misconfiguration causes retries multiple times until timeout (1 minute).

Checklist

IMPORTANT: We will close issues where the checklist has not been completed.

Please provide the following information:

  • librdkafka version (release number or git tag): <REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current">
  • Apache Kafka version: <REPLACE with e.g., 0.10.2.3>
  • librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
  • Operating system: <REPLACE with e.g., Centos 5 (x64)>
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@edenhill
Copy link
Contributor

Please fill out the checklist.

@hqthao-ts
Copy link
Author

I have solved this issue. Here is the investigation.

TL,DR: it is because of the kinit command took a long time to run (1 minute exactly).

The reason is:

  • kinit query runs in 1 minute. The reason because of in the krb5.conf file, flag dns_lookup_kdc is true. Set it to false can solve this problem. Reference
  • while kinit runs, state of brokers moves to AUTH_LEGACY.
  • This causes the job that runs every 1 second triggers to query the topic meta . data, but it cannot.
  • this loop will stop when kinit stops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants