[BUG] Spark StreamNative Driver should not require Admin URL #113

vikram-narayan · 2023-02-15T01:08:46Z

Describe the bug
Pulsar admins do not allow consumers/producers to access the admin URL. Consumers/producers using certificates for authentication/authorization. Teams using Spark driver cannot take advantage of Streamnative Spark Driver due to a lack of access to the admin URL.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

JTBS · 2023-02-27T12:43:51Z

I noticed this too. For internal workloads we can live with it. But clients like to send data to exposed Pulsar cluster, current version expects Admin URL to be provided.

Reason this is required from my understanding is:

to validate schema on target topic.
Create one if not present.
Throw error if schema on topic is not matching with incoming schema provided
Not sure if there are other reasons a client library expects Admin URL

Suggestion:
Can it be at least OPTIONAL - if Admin URL is not provided, above client side checks are not done.
Its up to the broker to accept/reject message based on topic schema compatibility

How does it work on reviewing/fixing issues with library code once approved as issue?

Is there a team monitoring open issues and prioritizing?
OR is there expectation that people who post issues to contribute and patch the issue?

nlu90 · 2023-03-02T20:06:23Z

@JTBS Yes, the Admin URL is needed for topic discovery, topic metadata query, schema management, and subscription/cursor management.

We have discussed several times internally that to remove the deps on Admin URL cleanly, Pulsar Client needs to be enhanced for doing the above tasks. This will require a Pulsar Improvement Proposal and need some more time to go through the process and be available in a release.

The short-term walkaround I'm thinking about is the same as you mentioned. Removing the hard requirement on admin URL with some functionalities sacrificed.

I'll keep you updated.

JTBS · 2023-03-02T20:37:12Z

@nlu90 Thank you for getting back

david-streamlio · 2023-03-21T19:50:27Z

Do you have any updates on this? For context, Verizon would like to have this by the end of April to avoid renewing a $200k contract with Confluent. So the sooner they have a timeline from us, the better off they are in the contract process. They are unable to terminate it if/until we can commit to having this.

diyankov · 2023-06-06T17:29:05Z

@nlu90 I just heard back from Verizon on this feature request. Currently, they see the error below on some consumers. They are asking if we can advise them how to avoid this error.

"23/06/05 16:21:36 ERROR Executor: Exception in task 80.0 in stage 11.0 (TID 1360)
org.apache.pulsar.client.api.PulsarClientException$MemoryBufferIsFullError: Client memory buffer is full
at org.apache.pulsar.client.impl.ProducerImpl.canEnqueueRequest(ProducerImpl.java:860)
at org.apache.pulsar.client.impl.ProducerImpl.sendAsync(ProducerImpl.java:429)
at org.apache.pulsar.client.impl.ProducerImpl.internalSendAsync(ProducerImpl.java:323)
at org.apache.pulsar.client.impl.ProducerImpl.internalSendWithTxnAsync(ProducerImpl.java:395)
at org.apache.pulsar.client.impl.PartitionedProducerImpl.internalSendWithTxnAsync(PartitionedProducerImpl.java:276)
at org.apache.pulsar.client.impl.PartitionedProducerImpl.internalSendAsync(PartitionedProducerImpl.java:220)
at org.apache.pulsar.client.impl.TypedMessageBuilderImpl.sendAsync(TypedMessageBuilderImpl.java:101)
at org.apache.spark.sql.pulsar.PulsarRowWriter.sendRow(PulsarWriteTask.scala:193)
at org.apache.spark.sql.pulsar.PulsarWriteTask.execute(PulsarWriteTask.scala:40)
at org.apache.spark.sql.pulsar.PulsarSinks$.$anonfun$write$2(PulsarSinks.scala:153)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.sql.pulsar.PulsarSinks$.$anonfun$write$1(PulsarSinks.scala:153)
at org.apache.spark.sql.pulsar.PulsarSinks$.$anonfun$write$1$adapted(PulsarSinks.scala:150)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2276)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)"

nlu90 · 2023-06-06T23:45:31Z

@diyankov I'm checking with the platform team.

nlu90 · 2023-06-08T19:06:30Z

@diyankov To avoid the issue, spark job developer need to pass additional configuration to set the pulsar client behavior:

spark
  .readStream
  .format("pulsar")
  .option("service.url", "pulsar+ssl://localhost:6651")
  .option("pulsar.client.blockIfQueueFull","true")

Notice the last line in the above code section

vikram-narayan added the type/bug label Feb 15, 2023

vikram-narayan changed the title ~~[BUG]~~ [BUG] Spark StreamNative Driver should not require Admin URL Feb 15, 2023

nlu90 self-assigned this Mar 2, 2023

nlu90 closed this as completed Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Spark StreamNative Driver should not require Admin URL #113

[BUG] Spark StreamNative Driver should not require Admin URL #113

vikram-narayan commented Feb 15, 2023

JTBS commented Feb 27, 2023

nlu90 commented Mar 2, 2023

JTBS commented Mar 2, 2023

david-streamlio commented Mar 21, 2023

diyankov commented Jun 6, 2023

nlu90 commented Jun 6, 2023

nlu90 commented Jun 8, 2023

[BUG] Spark StreamNative Driver should not require Admin URL #113

[BUG] Spark StreamNative Driver should not require Admin URL #113

Comments

vikram-narayan commented Feb 15, 2023

JTBS commented Feb 27, 2023

nlu90 commented Mar 2, 2023

JTBS commented Mar 2, 2023

david-streamlio commented Mar 21, 2023

diyankov commented Jun 6, 2023

nlu90 commented Jun 6, 2023

nlu90 commented Jun 8, 2023