Support Time32 and Time64 for Type Coercion #4104
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims at extending the Type Coercion optimization rule from Datafusion to support time types (Time32 and Time64), which is required for where clauses involving fields of these types.
The main changes can be found at datafusion/expr/src/type_coercion/binary.rs, in the method temporal_coercion, which is adapted to ensure that, when getting a pair of types with a string and a time type, the later is taken as coerced type. Given that both Time32 and Time64 have an argument to specify the TimeUnit, one must be careful. In particular, the implementation must be consistent with the TimeUnits supported by Arrow, which include Second and Millisecond for Time32, and Microsecond and Nanosecond for Time64. These four cases are taken into account in temporal_coercion by calling a function check_time_unit that checks if the TimeUnit is consistent with the type and returns an option to it. The changes in binary.rs are checked in new tests, which are all passing.
These modifications required a bunch of adaptations to be implemented in the datafusion code so as to consistently support Time32(TimeUnit::Second), Time32(TimeUnit::Millisecond), Time64(TimeUnit::Microsecond) and Time64(TimeUnit::Nanosecond). The changes are straightforward in general, except for the ones in the proto crate. I chose not to change the datafusion.proto file, which only supports a generic Time64 type, with no unit specification, and does not support Time32 at all. This is not a problem in itself, however the datafusion/proto/src/to_proto.rs and datafusion/proto/src/from_proto.rs files have to be adapted. Right now, the proto Time64 translates into a Time64Nanosecond, as specified in the datafusion/proto/src/from_proto.rs file. On the other hand, a proto Time64 can be obtained from all four Time64Nanosecond, Time64Microsecond, Time32Millisecond and Time32Second scalar value types, as implemented in datafusion/proto/src/to_proto.rs. This makes sense to me, but I am not sure it is the best solution. Perhaps we could change the datafusion.proto?