Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: directly create projection instead of using DataFrame::with_col…
…umn (#2222) # Description `DataFrame::with_column` performs a linear operation in the number of columns to append on an existing column, checking that nothing collides. On top of this once the projection a normalization step (also linear in number of columns) is performed before returning the dataframe. For a merge where we are performing a `when_matched_update_all` type operation on wide tables (100+ columns), this is in effect a `2*N^2` operation as we were adding the remapped case columns one at a time with `with_column` and then remapping it. This PR uses `project` directly to construct the logical plan. We don't need any of the special checking for name clashes or windowing that `with_column` provides and we discard it immediately down to an unoptimized logical plan anyway, so this produces no change to schema - just a much more compact logical plan. This reduces an example merge I had from taking 5+ minutes to just optimize the table, down to about 13 seconds including the merge.
- Loading branch information