-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving an extracted netCDF file is far too time consuming #10061
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
The difference in timing between your two cases is due to xarray lazily evaluating operations. It's waiting as long as possible before loading any data, but in general The same work is being done in both cases though, which is why both examples take a similar amount of total time to execute. From your example it's not clear if there is some other reason why it takes 50 seconds to evaluate. How big is your data file? |
The data file is about 80M and this is the case for all the data files I have. |
Unfortunately I'm not sure there's much we can do from here without much more info. If you want to debug why it takes longer than you expect, you can try varying the problem — change the amount of data that is queried from that file; try making a smaller sample, etc. (the attribution of load and save was a good step...) |
You could also try calling |
What happened?
time extracting area: 10.947124004364014
time saving: 487.40945315361023
time extracting area: 514.2160515785217
time saving: 0.14027118682861328
I am trying to cut the netcdfs files because they are too big. And from the 50 variables of one file, I extract the 7 most important ones and saving the file takes an unexpectedly long time. And what is weird is if I just use the method where (2nd image), it is too slow to extract the xarray but quick to save. If I use .sel, (1st image) this is the opposite. I don't know what to do.
My file has the dimensions : N_PROF: 54212N_CALIB: 1N_PARAM: 5N_LEVELS: 400N_HISTORY: 0 but has no coordinates.
Thank you!
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-27-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2024.2.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.14.1
netCDF4: 1.7.1
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
Nio: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: 3.10.0
bottleneck: 1.4.0
dask: 2024.9.0
distributed: 2024.9.0
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: 0.13.2
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: 0.24.3
sparse: 0.15.4
flox: None
numpy_groupies: None
setuptools: 73.0.1
pip: 24.2
conda: 24.7.1
pytest: 8.3.3
mypy: None
IPython: 8.27.0
sphinx: 7.4.7
The text was updated successfully, but these errors were encountered: