Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Document] Adding UDF by impl ScalarUDFImpl #9172

Merged
merged 4 commits into from
Feb 10, 2024
Merged

Conversation

yyy1000
Copy link
Contributor

@yyy1000 yyy1000 commented Feb 9, 2024

Which issue does this PR close?

Closes #9136.

Rationale for this change

See #9136

What changes are included in this PR?

Document update including adding a scalar udf by impl ScalarUDFImpl

Copy link
Contributor Author

@yyy1000 yyy1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_udf will use SimpleScalarUDF::new, which is not deprecated. The deprecated one is ScalarUDF::new, sorry for my previous wrong understanding.
https://github.com/apache/arrow-datafusion/blob/c9e4b7b7c5c2c2e7b3ea01040a6edeafd151410c/datafusion/expr/src/expr_fn.rs#L982-L996


To register a Scalar UDF, you need to wrap the function implementation in a [`ScalarUDF`] struct and then register it with the `SessionContext`.
DataFusion provides the [`create_udf`] and helper functions to make this easier.
There is a lower level API with more functionality but is more complex, that is documented in [`advanced_udf.rs`].
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt a little that whether my PR is necessary, it seems that there's already a updated link that helps user create scalarUDF by impl trait. 🤔 What do you think ? @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the value of this library guide is the surrounding text, so it seems ok to me to have the code replicated in two places

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yyy1000 -- this is looking like a good improvement. I left some structural suggestions that may improve things

Overall I am beginning to wonder given how straightforward the trait seems to be, if we should simply remove mention of the crate_udf method from the docs

@tshauck any thoughts?


To register a Scalar UDF, you need to wrap the function implementation in a [`ScalarUDF`] struct and then register it with the `SessionContext`.
DataFusion provides the [`create_udf`] and helper functions to make this easier.
There is a lower level API with more functionality but is more complex, that is documented in [`advanced_udf.rs`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the value of this library guide is the surrounding text, so it seems ok to me to have the code replicated in two places

Ok(Arc::new(new_array))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest ending the first code block here and then adding a sentence:

We now need to register the function with DataFusion so that it can be used in the context of a query.

(then rest of the example)

@yyy1000
Copy link
Contributor Author

yyy1000 commented Feb 9, 2024

Thanks for your review! @alamb and have applied the suggestions.
Yeah, I'm also thinking whether we should keep the old create_udf as a second way to adding udf or just remove that. :)
And after this PR, I think I can create a follow-up PR about Window UDF and Aggregate UDF.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @yyy1000

@alamb alamb merged commit afb169c into apache:main Feb 10, 2024
4 checks passed
@yyy1000 yyy1000 deleted the doc-udf branch February 10, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document: Enhance ScalarUDF document
2 participants