-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouping with multiple levels #1569
Comments
We should clarify: we support a subset of the pandas API. The |
Makes sense. Just needs a doc update, then. What's the preferred means to contribute doc (including little edits like this)? Pull requests? |
The "Edit on GitHub" link at the top right of doc page is probably the most convenient way for primary doc pages. It should walk you through making a pull request if you hit the "Edit this file" icon (next to the trash bin) at the top right of the file on GitHub. To clarify on |
Are we making silent mistakes at the moment? In [1]: import xarray as xr
In [2]: import pandas as pd
In [4]: import numpy as np
In [5]: ds=xr.Dataset(dict(a=(('z',), np.ones(10))), coords=dict(b=(('z'),np.arange(2).repeat(5)), c=(('z'),np.arange(5).repeat(2))))
In [6]: ds
Out[6]:
<xarray.Dataset>
Dimensions: (z: 10)
Coordinates:
b (z) int64 0 0 0 0 0 1 1 1 1 1
c (z) int64 0 0 1 1 2 2 3 3 4 4
Dimensions without coordinates: z
Data variables:
a (z) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
In [7]: ds.set_index(bc=['b','c']).groupby('bc').sum()
Out[7]:
<xarray.Dataset>
Dimensions: (bc: 6)
Coordinates:
* bc (bc) MultiIndex
- bc_level_0 (bc) int64 0 0 0 1 1 1
- bc_level_1 (bc) int64 0 1 2 2 3 4
Data variables:
a (bc) float64 10.0 10.0 10.0 10.0 10.0 10.0 # <--- this is wrong? With a In [19]: ds['a'].set_index(bc=['b','c'])
Out[19]:
<xarray.DataArray 'a' (z: 10)>
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Coordinates:
* bc (bc) MultiIndex
- b (bc) int64 0 0 0 0 0 1 1 1 1 1
- c (bc) int64 0 0 1 1 2 2 3 3 4 4
Dimensions without coordinates: z
In [20]: ds['a'].set_index(bc=['b','c']).groupby('bc')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-20-82bf6782525c> in <module>()
----> 1 ds['a'].set_index(bc=['b','c']).groupby('bc')
/usr/local/lib/python3.6/site-packages/xarray/core/common.py in groupby(self, group, squeeze)
472 core.groupby.DatasetGroupBy
473 """
--> 474 return self._groupby_cls(self, group, squeeze=squeeze)
475
476 def groupby_bins(self, group, bins, right=True, labels=None, precision=3,
/usr/local/lib/python3.6/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
213 group_dim, = group.dims
214
--> 215 expected_size = obj.sizes[group_dim]
216 if group.size != expected_size:
217 raise ValueError('the group variable\'s length does not '
/usr/local/lib/python3.6/site-packages/xarray/core/utils.py in __getitem__(self, key)
306
307 def __getitem__(self, key):
--> 308 return self.mapping[key]
309
310 def __iter__(self):
KeyError: 'bc' |
I got a solution which is not so flexible but works: import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo']),
np.array(['one', 'two', 'one', 'two', 'one', 'two'])]
da = pd.DataFrame(np.random.randn(6, 4)).to_xarray().to_array()
da.coords['index'] = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
da
Groupby on levels: def multi_groupby(data, levels):
return (((k0,k1),v1) for k0,v0 in data.groupby(levels[0]) for k1,v1 in v0.groupby(levels[1]))
for k,v in multi_groupby(da,['second','variable']):
print(k,v)
|
The example above is now fixed: In [1]: ds=xr.Dataset(dict(a=(('z',), np.ones(10))), coords=dict(b=(('z'),np.arange(2).repeat(5)), c=(('z'),np.arange(5).repeat(2))))
...:
In [2]: ds.set_index(bc=['b','c']).groupby('bc').sum()
Out[2]:
<xarray.Dataset>
Dimensions: (bc: 6, z: 10)
Coordinates:
* bc (bc) object MultiIndex
* b (bc) int64 0 0 0 1 1 1
* c (bc) int64 0 1 2 2 3 4
Dimensions without coordinates: z
Data variables:
a (bc, z) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 # <- correct |
http://xarray.pydata.org/en/stable/groupby.html says:
but when I supply the
level
keyword argument as described at https://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex, I get:The text was updated successfully, but these errors were encountered: