Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncProducer Consistency Question #613

Closed
rphillips opened this issue Feb 26, 2016 · 4 comments
Closed

AsyncProducer Consistency Question #613

rphillips opened this issue Feb 26, 2016 · 4 comments

Comments

@rphillips
Copy link

Greetings,

Using sarama has been wonderful. My last question was based on Snappy performance and it has not been issue for us. The AsyncProducer has been fantastic. However, I have a set of messages that need to be pushed to Kafka and the function can't return until after I know the messages have successfully been queued. I think I can do this with a channel on the Metadata of the message, and read the successes channel. Is this a good approach? I see an AddSet method that might be want I want, but this is not clear to me.

@eapache
Copy link
Contributor

eapache commented Feb 26, 2016

The way kafka works means there are a few related issues to deal with:

  • Should all the messages go to the same partition? Does it matter?
  • Is the order of the messages in this batch important?
  • What will your function do if some of the messages fail and some succeed?

Depending on the above, your best bet may be simply to add a SendMessages method to the SyncProducer. It already implements the trick you describe of putting a channel on the metadata of the message, but it currently only supports single messages.

If you need more precise control, you may have better luck constructing a manual ProduceRequest, fetching the raw Broker object from the Client, and using that.

@rphillips
Copy link
Author

To add more context, I am reading from a legacy scribe feed which needs a (OK or TRY_AGAIN) message before reading the next batch of messages. These batches of messages go to a subset of 64 partitions. The partition is derived from a key and the adler32 hash function. This seems to work nicely so far. There is not a way to correlate a batch to a partition up front and I will have to generate the partition they go to on the fly. The order in this batch is not necessarily important, just that they are arrive safely in Kafka.

Thank you for the feedback.

@eapache
Copy link
Contributor

eapache commented Feb 26, 2016

Adding SendMessages to the SyncProducer is probably your best bet then; you could just loop over SendMessage but that would probably be too slow.

You'll have to decide how to handle a "mixed" response though, when some messages succeed and some fail; it doesn't sound like scribe really supports that.

Tangentially, you may want to look at https://github.com/garo/scribe2kafka.

@rphillips
Copy link
Author

Thank you for the help. I'll look into writing a batch producer.

Note: The node-kafka client is what we have been using. It is not very performant on the consumer side and we have been having issues with partition elections. Sarama is working great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants