Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: schema metadata retrieval when listing parquet table #9134

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

brayanjuls
Copy link
Contributor

Which issue does this PR close?

Closes #9081.

Rationale for this change

What changes are included in this PR?

Fix for schema metadata retrieval when listing parquet table as explained in the issue.

Are these changes tested?

Yes.

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Feb 5, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @brayanjuls -- this is a really nice contribution. Excellent work and first PR

@@ -148,6 +148,10 @@ async fn schema_merge_can_preserve_metadata() {
.read_parquet(&table_path, options.clone())
.await
.unwrap();

let actual = df.schema().metadata();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- I verified this tests fails without the code change in this PR.


assertion `left == right` failed
  left: {}
 right: {"foo2": "baz", "foo": "bar"}

Left:  {}
Right: {"foo2": "baz", "foo": "bar"}
<Click to see difference>

thread 'parquet::schema::schema_merge_can_preserve_metadata' panicked at datafusion/core/tests/parquet/schema.rs:153:5:
assertion `left == right` failed
  left: {}
 right: {"foo2": "baz", "foo": "bar"}
stack backtrace:
   0:        0x105d8c564 - std::backtrace_rs::backtrace::libunwind::trace::h2966c6fbfac9d426
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:        0x105d8c564 - std::backtrace_rs::backtrace::trace_unsynchronized::h8a5f4aefe890b7c5
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:        0x105d8c564 - std::sys_common::backtrace::_print_fmt::h7574dd98fd39c257
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:        0x105d8c564 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h298c9ab285ff3934
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:        0x105dad420 - core::fmt::rt::Argument::fmt::hf9661447f7b99899
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:        0x105dad420 - core::fmt::write::h4e276abdb6d0c2a1
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:        0x105d8a198 - std::io::Write::write_fmt::h33fe9536f43329b4
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:        0x105d8c3a0 - std::sys_common::backtrace::_print::h09e653c6686dbd70
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:        0x105d8c3a0 - std::sys_common::backtrace::print::hd8bd9ecab1f94b94
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:        0x105d8de58 - std::panicking::default_hook::{{closure}}::h520eeb743fc98fb4
  10:        0x105d8dafc - std::panicking::default_hook::ha6550ffe49b63df1
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:289:9
  11:        0x1023f7af0 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hf67fcbad212018fd
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2021:9
  12:        0x1023f7af0 - test::test_main::{{closure}}::h2e290fb4d310d61e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:138:21
  13:        0x105d8e3b8 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hc816f01a75473565
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2021:9
  14:        0x105d8e3b8 - std::panicking::rust_panic_with_hook::hddb0e884a202de7c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:783:13
  15:        0x105d8e180 - std::panicking::begin_panic_handler::{{closure}}::hd2798398a2fd9077
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  16:        0x105d8c9cc - std::sys_common::backtrace::__rust_end_short_backtrace::h9201cc364dbb8a23
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  17:        0x105d8df1c - rust_begin_unwind
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
  18:        0x105eb6a5c - core::panicking::panic_fmt::h4d5168028d4c43c7
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
  19:        0x105eb6e2c - core::panicking::assert_failed_inner::h253abc89d7bdb812
  20:        0x105db6ec8 - core::panicking::assert_failed::ha3d67793fd6ac036
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:279:5
  21:        0x1023ccc38 - parquet_exec::parquet::schema::schema_merge_can_preserve_metadata::{{closure}}::h2a57fa1a2fffd163
                               at /Users/andrewlamb/Software/arrow-datafusion/datafusion/core/tests/parquet/schema.rs:153:5
  22:        0x1022e7d44 - <core::pin::Pin<P> as core::future::future::Future>::poll::he30aaf5a814bbae6
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125:9
  23:        0x1022e7d90 - <core::pin::Pin<P> as core::future::future::Future>::poll::heee45a717bd8ac8d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125:9
  24:        0x1022e4940 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}::h28dc800d4b433bc8
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:665:57
  25:        0x1022e487c - tokio::runtime::coop::with_budget::he294b2aa4876e430
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  26:        0x1022e487c - tokio::runtime::coop::budget::hb07f2a42eb506b4d
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  27:        0x1022e487c - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::h47c1f9e30f41c28a
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:665:25
  28:        0x1022e37d4 - tokio::runtime::scheduler::current_thread::Context::enter::h3ddee14a0f9df325
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:410:19
  29:        0x1022e42c4 - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::h5ad3dc5f4b7fc31b
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:664:36
  30:        0x1022e40c8 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}::h2d6bee10346ed1bb
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:743:68
  31:        0x102280c24 - tokio::runtime::context::scoped::Scoped<T>::set::h869ac8f81efcfb5f
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/scoped.rs:40:9
  32:        0x1023b5da4 - tokio::runtime::context::set_scheduler::{{closure}}::h094bd651ff22ca23
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:26
  33:        0x1022e1770 - std::thread::local::LocalKey<T>::try_with::ha191c29a90026bda
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/local.rs:270:16
  34:        0x1022e0c3c - std::thread::local::LocalKey<T>::with::h337b39da41917070
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/local.rs:246:9
  35:        0x1023b5d60 - tokio::runtime::context::set_scheduler::h6c579318719b164f
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:9
  36:        0x1022e3f08 - tokio::runtime::scheduler::current_thread::CoreGuard::enter::h02326bb925183dcd
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:743:27
  37:        0x1022e40fc - tokio::runtime::scheduler::current_thread::CoreGuard::block_on::he7bacd175479e86e
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:652:19
  38:        0x1022e2f00 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::{{closure}}::h2ef518d8229a854d
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:175:28
  39:        0x10233d5e0 - tokio::runtime::context::runtime::enter_runtime::h6b94c4c8c06baf94
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  40:        0x1022e2e40 - tokio::runtime::scheduler::current_thread::CurrentThread::block_on::he59b2afc6287178d
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/current_thread/mod.rs:167:9
  41:        0x1022836e0 - tokio::runtime::runtime::Runtime::block_on::hd53481abe1c750e8
                               at /Users/andrewlamb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:348:47
  42:        0x102218c98 - parquet_exec::parquet::schema::schema_merge_can_preserve_metadata::hb03bbe9ac10a526f
                               at /Users/andrewlamb/Software/arrow-datafusion/datafusion/core/tests/parquet/schema.rs:178:5
  43:        0x1023cbdc8 - parquet_exec::parquet::schema::schema_merge_can_preserve_metadata::{{closure}}::h250d6769061b5f05
                               at /Users/andrewlamb/Software/arrow-datafusion/datafusion/core/tests/parquet/schema.rs:106:46
  44:        0x102327174 - core::ops::function::FnOnce::call_once::h2e353b110aaa5edf
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  45:        0x1023fbf70 - core::ops::function::FnOnce::call_once::he7c46f18244e22ba
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  46:        0x1023fbf70 - test::__rust_begin_short_backtrace::hf05961ebc6f8d3a3
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:628:18
  47:        0x1023faf3c - test::run_test_in_process::{{closure}}::h3be023be3e470e24
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:651:60
  48:        0x1023faf3c - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h045822bb433f4fa3
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panic/unwind_safe.rs:272:9
  49:        0x1023faf3c - std::panicking::try::do_call::he686b452df9e6b8a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  50:        0x1023faf3c - std::panicking::try::hc2f6216ead8eae6f
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  51:        0x1023faf3c - std::panic::catch_unwind::h02265fe17265fda2
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  52:        0x1023faf3c - test::run_test_in_process::h5250a48154d3e8a7
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:651:27
  53:        0x1023faf3c - test::run_test::{{closure}}::h26897ddc37b2966c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:574:43
  54:        0x1023ce4c8 - test::run_test::{{closure}}::h5c8976010d262547
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/test/src/lib.rs:602:41
  55:        0x1023ce4c8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h14e0bf2ff6bde926
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
  56:        0x1023d2cbc - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h41c404ebec5fd0f9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/mod.rs:529:17
  57:        0x1023d2cbc - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h9bea6ba67aceee28
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panic/unwind_safe.rs:272:9
  58:        0x1023d2cbc - std::panicking::try::do_call::h681e7ac0e901b900
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  59:        0x1023d2cbc - std::panicking::try::hc0aa288ce3642911
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  60:        0x1023d2cbc - std::panic::catch_unwind::he941ae36032548bf
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  61:        0x1023d2cbc - std::thread::Builder::spawn_unchecked_::{{closure}}::h03cf06ae2f75ecc3
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/mod.rs:528:30
  62:        0x1023d2cbc - core::ops::function::FnOnce::call_once{{vtable.shim}}::h9410c0b28133a970
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  63:        0x105d93140 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::he407efbeff8db698
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  64:        0x105d93140 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h15f2f0818ccedd2c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/alloc/src/boxed.rs:2007:9
  65:        0x105d93140 - std::sys::unix::thread::Thread::new::thread_start::h9b6324e2391e6ebb
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/unix/thread.rs:108:17
  66:        0x186126034 - __pthread_joiner_wake

@alamb
Copy link
Contributor

alamb commented Feb 5, 2024

I think you can fix the failing CI check by running cargo fmt locally and checking in the result. https://github.com/apache/arrow-datafusion/actions/runs/7787359914/job/21234595802?pr=9134

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍

@alamb alamb merged commit 9669520 into apache:main Feb 6, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 6, 2024

Thanks again @brayanjuls and @Jefffrey for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ctx.read_parquet and ctx.register_parquet don't load schema metadata
3 participants