Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Drop timezone in from_arrow #3392

Merged
merged 2 commits into from
Jan 31, 2025

Conversation

martindurant
Copy link
Contributor

Awkward does not support timezones in timestamp types, because that's how things are in numpy.

There is not an easy way to remove arrow timezones, so I am proposing that we simply drop them, and document this well in the from_arrow text.

This came in the context of akimbo.spark, since spark always localises times. This PR essentially undoes this action. I am open to other solutions.

Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.27%. Comparing base (b749e49) to head (329d53c).
Report is 264 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/_connect/pyarrow/conversions.py 91.50% <100.00%> (ø)
src/awkward/operations/ak_from_arrow.py 100.00% <ø> (ø)
src/awkward/operations/ak_from_arrow_schema.py 100.00% <ø> (+27.27%) ⬆️

... and 170 files with indirect coverage changes

@ianna ianna requested a review from jpivarski January 31, 2025 15:50
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about dropping timezones and being explicit about it in the documentation (though this PR doesn't change any documentation). I assume that you're using

getattr(storage_type, "tz", None) is None

rather than

hasattr(storage_type, "tz")

because the tz attribute might or might not exist and might or might not be None. Even if that's not the case now, it sounds like a good way to defensively future-proof it.

@martindurant
Copy link
Contributor Author

the tz attribute might or might not exist and might or might not be None

Right. Only time types can have the attribute at all, but it can be None or a string.

I'll add doc changes and a test to show it works.

Question: I needed to change two locations to make my spark workflow happy. What is the difference between how these are reached - is it buffer versus schema conversion or something else?

@jpivarski
Copy link
Member

First, only popbuffers existed; it did all the work of actually converting data. Then we found that we needed to predict what Form an Arrow Type would produce without actually converting data, and the most expedient way to do that was to duplicate the popbuffers infrastructure with a version that works on Forms, rather than Contents.

It doesn't make sense to merge them in a way that does two recursive descents through the tree: when you want to convert data, you don't need to do anything with Forms, so it doesn't buy you anything to do form_popbuffers first and then something else for the Contents. They could perhaps be merged into a single function that might operate on Form or might operate on Content, but that function would be more complicated, either trying to duck-type things to let both Form and Content through or have a lot of extra if paths for the two cases. I think this is a case where "Don't Repeat Yourself" gives way to "okay, just two repetitions, one to focus on Form, another to focus on Content; but no more!"

@martindurant
Copy link
Contributor Author

is it buffer versus schema conversion or something else?

"yes" :)

I added the doc and test. The test just makes sure there is no error. Either of the from_ would have failed without the change in conversions.py .

@martindurant
Copy link
Contributor Author

All green

@jpivarski jpivarski merged commit 7cbca98 into scikit-hep:main Jan 31, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants