Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby does not allow grouping of lat-lon pair #4732

Closed
alterapars opened this issue Dec 24, 2020 · 4 comments
Closed

Groupby does not allow grouping of lat-lon pair #4732

alterapars opened this issue Dec 24, 2020 · 4 comments

Comments

@alterapars
Copy link

What happened:
xarray returns error message when grouping on latitude/longitude pairs

What you expected to happen:
Seamless execution of the groupby method providing a latitude/longitude pair

Minimal Complete Verifiable Example:

import numpy as np
import xarray as xr

xr_dataframe = xr.tutorial.load_dataset("air_temperature")

xr_dataframe.air.groupby('lon', 'lat')

returns the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-33-e32cb3023749> in <module>
----> 1 ds.air.groupby('lon', 'lat')
      2 

~\AppData\Local\Continuum\anaconda3\envs\xarray\lib\site-packages\xarray\core\common.py in groupby(self, group, squeeze, restore_coord_dims)
    690         # backward-compat issues.
    691         if not isinstance(squeeze, bool):
--> 692             raise TypeError(
    693                 f"`squeeze` must be True or False, but {squeeze} was supplied"
    694             )

TypeError: `squeeze` must be True or False, but lat was supplied

if I state squeeze explicitely:

xr_dataframe.air.groupby('lon', 'lat', squeeze=False).sum()

it returns the error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-27-988979b87a81> in <module>
----> 1 MODIS_burned_area.burned_area.groupby('lon', 'lat', squeeze=False).sum()

TypeError: groupby() got multiple values for argument 'squeeze'

handing over an array similar to the pd notation:

xr_dataframe.groupby(['lon', 'lat']).sum()

returns the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-4ae9ac839e02> in <module>
----> 1 ds.groupby(['lon', 'lat'], squeeze=False).sum()

~\AppData\Local\Continuum\anaconda3\envs\xarray\lib\site-packages\xarray\core\common.py in groupby(self, group, squeeze, restore_coord_dims)
    694             )
    695 
--> 696         return self._groupby_cls(
    697             self, group, squeeze=squeeze, restore_coord_dims=restore_coord_dims
    698         )

~\AppData\Local\Continuum\anaconda3\envs\xarray\lib\site-packages\xarray\core\groupby.py in __init__(self, obj, group, squeeze, grouper, bins, restore_coord_dims, cut_kwargs)
    308         if not isinstance(group, (DataArray, IndexVariable)):
    309             if not hashable(group):
--> 310                 raise TypeError(
    311                     "`group` must be an xarray.DataArray or the "
    312                     "name of an xarray variable or dimension."

TypeError: `group` must be an xarray.DataArray or the name of an xarray variable or dimension.Received ['lon', 'lat'] instead.

Describe alternatives you've considered:

I chose to use the pandas groupby method in the meantime which works with lat-lon pairs.

Environment:

INSTALLED VERSIONS ------------------ commit: None python: 3.9.1 | packaged by conda-forge | (default, Dec 21 2020, 22:05:45) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.3

xarray: 0.16.2
pandas: 1.1.5
numpy: 1.19.4
scipy: 1.5.3
netCDF4: 1.5.5.1
pydap: None
h5netcdf: None
h5py: 3.1.0
Nio: None
zarr: None
cftime: 1.3.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2020.12.0
distributed: 2020.12.0
matplotlib: 3.3.3
cartopy: None
seaborn: 0.11.1
numbagg: None
pint: None
setuptools: 49.6.0.post20201009
pip: 20.3.3
conda: None
pytest: None
IPython: 7.19.0
sphinx: None

@mathause
Copy link
Collaborator

Yes, this is a long-standing issue, #1056 What may work is to use groupby with a MuliIndex (da.stack(latlon=("lat", "lon")).groupby("latlon"))

However, what is your use case? Because every lat/ lon pair is distinct, so it wouldn't actually do anything (it would sum over the only data value there is in each group). Unless I am missing something, which I probably do.

@alterapars
Copy link
Author

alterapars commented Dec 25, 2020

Thank you @mathause , I see. My use cases would be

  • time series analysis at each location
  • visualization of e.g. standard deviation over time for each grid cell on my regional dataset

@mathause
Copy link
Collaborator

You should be able to do this without groupby. If you have a dataset with dimensions time x lat x lon you can calculte the mean over time at each location using xr_dataframe.air.mean("time") or if you want the mean at each time you can use xr_dataframe.air.mean(("lon", "lat")). The visualisation should work via xr_dataframe.air.std("time").plot().

@mathause
Copy link
Collaborator

I close this as duplicate of #1056 - feel free to re-open if you have more questions on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants