Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove record_batch! macro once upstream updates #13037

Open
timsaucer opened this issue Oct 21, 2024 · 8 comments
Open

Remove record_batch! macro once upstream updates #13037

timsaucer opened this issue Oct 21, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge?

Follow on to apache/arrow-rs#6588

Once arrow-rs releases and we update datafusion, we can remove the record_batch! macro.

Describe the solution you'd like

Remove and use upstream method instead.

Describe alternatives you've considered

No response

Additional context

No response

@timsaucer timsaucer added the enhancement New feature or request label Oct 21, 2024
@buraksenn
Copy link
Contributor

take

@buraksenn
Copy link
Contributor

I will keep an eye on the update and do the required change if it is okay.

@timsaucer
Copy link
Contributor Author

Sounds great! It will probably take a while as we have to wait for the upstream to make a release, and then DF to update to that release version.

@buraksenn
Copy link
Contributor

Should this be done? I saw this #12846. I can delete create_array and record_batch macros to make it use the same as in arrow. Also they are also not identical:

  • Datafusion: ($(($name: expr, $type: ident, $values: expr)),*)
  • Arrow: ($(($name: expr, $type: ident, [$($values: expr),])),)

@timsaucer @alamb ?

@alamb
Copy link
Contributor

alamb commented Dec 9, 2024

Should this be done? I saw this #12846. I can delete create_array and record_batch macros to make it use the same as in arrow. Also they are also not identical:

  • Datafusion: ($(($name: expr, $type: ident, $values: expr)),*)
  • Arrow: ($(($name: expr,
    t
    y
    p
    e
    :
    i
    d
    e
    n
    t
    ,
    [
    ($values: expr),])),)

@timsaucer @alamb ?

I recommend:

  1. Switch to use the upstream arrow one
  2. Deprecate (don't remove) the API in datafusion, per https://datafusion.apache.org/library-user-guide/api-health.html

@buraksenn
Copy link
Contributor

Should this be done? I saw this #12846. I can delete create_array and record_batch macros to make it use the same as in arrow. Also they are also not identical:

  • Datafusion: ($(($name: expr, $type: ident, $values: expr)),*)
  • Arrow: ($(($name: expr,
    t
    y
    p
    e
    :
    i
    d
    e
    n
    t
    ,
    [
    ($values: expr),])),)

@timsaucer @alamb ?

I recommend:

  1. Switch to use the upstream arrow one
  2. Deprecate (don't remove) the API in datafusion, per https://datafusion.apache.org/library-user-guide/api-health.html

Will do as recommended. Thanks @alamb

@buraksenn
Copy link
Contributor

I've took another look into this and tried to change it but the issue is that macro in arrow only accepts array literals such as:
[1,2,3]. However, in the datafusion macro it accepts vectors. One reference of this macro is in this function:

    let batches = vec![
        create_record_batch(1, 5),
        create_record_batch(6, 1),
        create_record_batch(7, 5),
    ];

fn create_record_batch(start_value: i32, num_values: usize) -> RecordBatch {
    let end_value = start_value + num_values as i32;
    let a_vals: Vec<i32> = (start_value..end_value).collect();
    let b_vals: Vec<f64> = a_vals.iter().map(|v| *v as f64).collect();

    record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap()
}

As far as I've tried, achieving the same is not possible with macro in arrow crate in the same way. That's why I think we can update the macro in arrow to the same in datafusion and then deprecate it here. But since I'm not Rust expert not sure on this.

Arrow: https://github.com/apache/arrow-rs/blob/c4dbf0d8af6ca5a19b8b2ea777da3c276807fc5e/arrow-array/src/record_batch.rs#L153-L174
Datafusion:

macro_rules! record_batch {
($(($name: expr, $type: ident, $values: expr)),*) => {
{
let schema = std::sync::Arc::new(arrow_schema::Schema::new(vec![
$(
arrow_schema::Field::new($name, arrow_schema::DataType::$type, true),
)*
]));
let batch = arrow_array::RecordBatch::try_new(
schema,
vec![$(
$crate::create_array!($type, $values),
)*]
);
batch
}
}
}

cc @timsaucer @alamb

@alamb
Copy link
Contributor

alamb commented Dec 15, 2024

Thanks for looking at this @buraksenn

I thought we more or less ported the macro from DataFusion to Arrow so the different is surprising, but I haven't had a chance to look into it in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants