Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reindex is very slow with small chunksizes #10054

Open
5 tasks done
Illviljan opened this issue Feb 16, 2025 · 2 comments
Open
5 tasks done

reindex is very slow with small chunksizes #10054

Illviljan opened this issue Feb 16, 2025 · 2 comments
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-lazy array topic-performance

Comments

@Illviljan
Copy link
Contributor

Illviljan commented Feb 16, 2025

What happened?

The lazy computation time seems to be dependent on the indexers size in Dataset.reindex.

What did you expect to happen?

Close to constant time with lazy reindexing.

Minimal Complete Verifiable Example

import numpy as np
import dask.array as da

import xarray as xr

ds = xr.Dataset(
    data_vars={
        "variable_name": ("time", da.from_array(
            np.array(["test"], dtype=str), chunks=(1,)
        ))
    },
    coords={"time": ("time", np.array([0]))}
)

%timeit ds.reindex(time=np.linspace(0, 10, 50), method="nearest")
# 8.72 ms ± 148 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit ds.reindex(time=np.linspace(0, 10, 100), method="nearest")
# 16.3 ms ± 424 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit ds.reindex(time=np.linspace(0, 10, 1000), method="nearest")
# 152 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Anything else we need to know?

This case shows up for example when using ds.interp with string variables.

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:04:44) [MSC v.1940 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('Swedish_Sweden', '1252')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.7.1.dev363+g99426cbb.d20240904
pandas: 2.2.2
numpy: 2.2.1
scipy: 1.14.1
netCDF4: 1.7.1
pydap: 3.5
h5netcdf: 1.3.0
h5py: 3.11.0
zarr: 2.18.2
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.9.0
bottleneck: 1.4.0
dask: 2024.11.2
distributed: 2024.11.2
matplotlib: 3.9.2
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.6.1
cupy: None
pint: None
sparse: None
flox: 0.9.10
numpy_groupies: 0.11.2
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: 8.3.2
mypy: 1.14.1
IPython: 8.27.0
sphinx: 8.0.2

@Illviljan Illviljan added bug needs triage Issue that has not been reviewed by xarray team member topic-performance topic-lazy array labels Feb 16, 2025
@dcherian
Copy link
Contributor

This is all in dask. Can you open an issue there please?

Image

@Illviljan
Copy link
Contributor Author

Not a lot of interest in fixing on the dask side it seems.
I got here with a ds.interp, maybe there should be another branch that broadcasts along the 1 sized dimension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member topic-lazy array topic-performance
Projects
None yet
Development

No branches or pull requests

2 participants