-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] New Consumer+Producer randomly fails to access broker #2519
Comments
I don't see any errors in the logs unfortunately, so not much to go on. But it does seem a bit strange that the producer/consumer ids ("producer-NN") are increasing, they start at 1 for each process, this might indicate that clients are infact forked after creation or that multiple clients (18??) are instantiated in the same process. To cut down on noise it would be interesting with just |
Yup, looks like no actual errors, but one process's Consumer is constantly at I believe they're definitely trying to connect, see this excerpt for
I will set the debug to log only broker and retry... might take a bit to reproduce, as said. |
There will be no instantiation at import time, and even if it was it wouldn't explain instance id's starting at 13, so something is indeed fishy. |
Is it possible that your container is running out of fds, or possibly threads? |
Highly doubtful. Does kafka impose any fds/threads limits besides the OS ones?
They don't start at 13, this log is not from the start of the entire application. It's only the logs of processes that started together with the problematic 19163 process. Consider this scenario - (Still trying to reproduce the issue with |
@edenhill returning with some more logs. This is the same scenario as described before, consider the following -
About the id's, I suggested what I think is happening in the previous comment. Do you believe the forking might be problematic? (reminder - the process we fork from has kafka objects on its own - but we don't use them in the forked processes - the forked processes create new Consumers+Producers)
|
@kerenor23 Thank you for the logs. Unfortunately I can't find an explanation in the logs for why the producer can't connect to the broker at localhost:9092 while the consumer can. |
In the broker logs there's nothing about the Consumer & Producer that don't connect, as if no requests are even made. |
Description
I have a python app, working with confluent-kafka 1.0.1
Part of the app's flow is to regularly fork new processes for some data handling, closing them when done. Each of the new processes creates a Consumer and Producer.
Only randomly and only occasionally, one of the processes (not necessarily the same one, but there seems to be a preference to the one called
IM
in the logs I attach) will fail to connect to the broker.The broker is running on a docker locally.
Once a process succeeded connecting, it will continue correctly. Once failed, it will not recover, forcing us to stop this data handling processing and restart.
Settings
The Producer and Consumers are created in-process (i.e. they are not forked).
Each client config (for the Producers) is either this:
or, for minimal latency,
Each Consumer of each process is its own new consumer group (
group.id
is generated automatically) listening to a new topic designated to this Consumer alone."auto.offset.reset"
is set tolatest
.We rely on automatic topic creation for all our topics.
There is no guaranteed way to reproduce it; it seems to really happen randomly.
Logs
I'm adding a general log of all the processes; each process's Producer logs with
"debug": "broker,topic,msg"
and Consumer logs with"debug": "cgrp,topic,consumer"
.The relevant part is how all processes except for the one with
PID 19163
started correctly, and that specific one didn't.They were all started at the same time, running on the same machine (which is ubuntu 18.04 btw), yet that one failed to access the local broker. Running the same scenario again, it might or might not work.
The broker logs don't show any request from this process.
Any ideas as to what to check/change would be appreciated.
Thanks!
(.... and it goes on like that of course)
The text was updated successfully, but these errors were encountered: