-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Parquet column writer Dictionary(_, Decimal128)
and Dictionary(_, Decimal256)
#6987
Conversation
b2ad521
to
8f0c20c
Compare
8f0c20c
to
f3c08fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but would appreciate a second opinion. Nice catch, thanks!
cc @alamb
@@ -270,6 +270,59 @@ where | |||
|
|||
Arc::new(array) as ArrayRef | |||
} | |||
ArrowType::Dictionary(_, value_type) => match value_type.as_ref() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks correct to me. It would be nice if we could get some code reuse here.
@@ -2670,6 +2712,52 @@ mod tests { | |||
one_column_roundtrip_with_schema(Arc::new(d), schema); | |||
} | |||
|
|||
#[test] | |||
fn arrow_writer_decimal128_dictionary() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified that tests fail if the read changes are reverted.
} | ||
.with_precision_and_scale(*p, *s)?; | ||
|
||
arrow_cast::cast(&array, target_type)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a second look, I'm curious why the cast here rather than creating the ArrayRef
directly as is done above? Will this have any performance implications?
Edit: ah, ok. We need to cast to dictionary. Disregard 😅
Dictionary(_, Decimal128)
and Dictionary(_, Decimal256)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue does this PR close?
Closes #.
Rationale for this change
Currently Parquet reads/writes for Arrow
Dictionary(Decimal)
type where Decimal is INT-based is falling back toarrow_cast::cast
, which leads to losing precision.What changes are included in this PR?
For
Dictionary(Decimal)
type where decimal is int-based, writer and reader now performarrow_cast::cast
only for dictionary packing/unpacking, while values are only casted to required integer type without being modified, similar to how regularDecimal
works.Are there any user-facing changes?
No