Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Improve documentation for AggregateUDFImpl::accumulator and AccumulatorArgs #9920

Merged
merged 7 commits into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions datafusion/core/tests/user_defined/user_defined_aggregates.rs
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,6 @@ impl Accumulator for TimeSum {
let arr = arr.as_primitive::<TimestampNanosecondType>();

for v in arr.values().iter() {
println!("Adding {v}");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanups

self.sum += v;
}
Ok(())
Expand All @@ -538,7 +537,6 @@ impl Accumulator for TimeSum {
}

fn evaluate(&mut self) -> Result<ScalarValue> {
println!("Evaluating to {}", self.sum);
Ok(ScalarValue::TimestampNanosecond(Some(self.sum), None))
}

Expand All @@ -558,7 +556,6 @@ impl Accumulator for TimeSum {
let arr = arr.as_primitive::<TimestampNanosecondType>();

for v in arr.values().iter() {
println!("Retracting {v}");
self.sum -= v;
}
Ok(())
Expand Down
29 changes: 22 additions & 7 deletions datafusion/expr/src/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,33 @@ pub type ScalarFunctionImplementation =
pub type ReturnTypeFunction =
Arc<dyn Fn(&[DataType]) -> Result<Arc<DataType>> + Send + Sync>;

/// Arguments passed to create an accumulator
/// [`AccumulatorArgs`] contains information about how an aggregate
/// function was called, including the types of its arguments and any optional
/// ordering expressions.
pub struct AccumulatorArgs<'a> {
// default arguments
/// the return type of the function
/// The return type of the aggregate function.
pub data_type: &'a DataType,
/// the schema of the input arguments
/// The schema of the input arguments
pub schema: &'a Schema,
/// whether to ignore nulls
/// Whether to ignore nulls.
///
/// SQL allows the user to specify `IGNORE NULLS`, for example:
///
/// ```sql
/// SELECT FIRST_VALUE(column1) IGNORE NULLS FROM t;
/// ```
pub ignore_nulls: bool,

// ordering arguments
/// the expressions of `order by`, if no ordering is required, this will be an empty slice
/// The expressions in the `ORDER BY` clause passed to this aggregator.
///
/// SQL allows the user to specify the ordering of arguments to the
/// aggregate using an `ORDER BY`. For example:
///
/// ```sql
/// SELECT FIRST_VALUE(column1 ORDER BY column2) FROM t;
/// ```
///
/// If no `ORDER BY` is specified, `sort_exprs`` will be empty.
pub sort_exprs: &'a [Expr],
}

Expand Down
12 changes: 10 additions & 2 deletions datafusion/expr/src/udaf.rs
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,8 @@ where
/// See [`advanced_udaf.rs`] for a full example with complete implementation and
/// [`AggregateUDF`] for other available options.
///
///
/// [`advanced_udaf.rs`]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/advanced_udaf.rs
///
/// # Basic Example
/// ```
/// # use std::any::Any;
Expand Down Expand Up @@ -282,7 +282,8 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
/// Return a new [`Accumulator`] that aggregates values for a specific
/// group during query execution.
///
/// `acc_args`: the arguments to the accumulator. See [`AccumulatorArgs`] for more details.
/// acc_args: [`AccumulatorArgs`] contains information about how the
/// aggregate function was called.
fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>>;

/// Return the fields used to store the intermediate state of this accumulator.
Expand Down Expand Up @@ -325,6 +326,13 @@ pub trait AggregateUDFImpl: Debug + Send + Sync {
/// If the aggregate expression has a specialized
/// [`GroupsAccumulator`] implementation. If this returns true,
/// `[Self::create_groups_accumulator]` will be called.
///
/// # Notes
///
/// Even if this function returns true, DataFusion will still use
/// `Self::accumulator` for certain queries, such as when this aggregate is
/// used as a window function or when there no GROUP BY columns in the
/// query.
fn groups_accumulator_supported(&self) -> bool {
false
}
Expand Down
Loading