Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping with multiple levels #1569

Closed
jjpr-mit opened this issue Sep 12, 2017 · 6 comments
Closed

Grouping with multiple levels #1569

jjpr-mit opened this issue Sep 12, 2017 · 6 comments

Comments

@jjpr-mit
Copy link

http://xarray.pydata.org/en/stable/groupby.html says:

xarray supports “group by” operations with the same API as pandas

but when I supply the level keyword argument as described at https://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-with-multiindex, I get:

TypeError                                 Traceback (most recent call last)
<ipython-input-12-566fc67c0151> in <module>()
----> 1 hvm_it_v6_obj = hvm_it_v6.groupby(level=["category","obj"]).mean(dim="presentation")
      2 hvm_it_v6_obj

TypeError: groupby() got an unexpected keyword argument 'level'
@shoyer
Copy link
Member

shoyer commented Sep 12, 2017

We should clarify: we support a subset of the pandas API. The level argument and grouping by multiple variables at once is not yet supported.

@jjpr-mit
Copy link
Author

Makes sense. Just needs a doc update, then.

What's the preferred means to contribute doc (including little edits like this)? Pull requests?

@shoyer
Copy link
Member

shoyer commented Sep 12, 2017

The "Edit on GitHub" link at the top right of doc page is probably the most convenient way for primary doc pages. It should walk you through making a pull request if you hit the "Edit this file" icon (next to the trash bin) at the top right of the file on GitHub.

To clarify on level: it's not supported, but instead you can simply pass the name of a MultiIndex level directly as a name. Multiple arguments are truly not supported yet (#324).

@max-sixty
Copy link
Collaborator

Are we making silent mistakes at the moment?

In [1]: import xarray as xr
In [2]: import pandas as pd

In [4]: import numpy as np

In [5]: ds=xr.Dataset(dict(a=(('z',), np.ones(10))), coords=dict(b=(('z'),np.arange(2).repeat(5)), c=(('z'),np.arange(5).repeat(2))))

In [6]: ds
Out[6]:
<xarray.Dataset>
Dimensions:  (z: 10)
Coordinates:
    b        (z) int64 0 0 0 0 0 1 1 1 1 1
    c        (z) int64 0 0 1 1 2 2 3 3 4 4
Dimensions without coordinates: z
Data variables:
    a        (z) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

In [7]: ds.set_index(bc=['b','c']).groupby('bc').sum()
Out[7]:
<xarray.Dataset>
Dimensions:     (bc: 6)
Coordinates:
  * bc          (bc) MultiIndex
  - bc_level_0  (bc) int64 0 0 0 1 1 1
  - bc_level_1  (bc) int64 0 1 2 2 3 4
Data variables:
    a           (bc) float64 10.0 10.0 10.0 10.0 10.0 10.0   # <--- this is wrong?

With a DataArray, we do raise:

In [19]: ds['a'].set_index(bc=['b','c'])
Out[19]:
<xarray.DataArray 'a' (z: 10)>
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
Coordinates:
  * bc       (bc) MultiIndex
  - b        (bc) int64 0 0 0 0 0 1 1 1 1 1
  - c        (bc) int64 0 0 1 1 2 2 3 3 4 4
Dimensions without coordinates: z

In [20]: ds['a'].set_index(bc=['b','c']).groupby('bc')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-20-82bf6782525c> in <module>()
----> 1 ds['a'].set_index(bc=['b','c']).groupby('bc')

/usr/local/lib/python3.6/site-packages/xarray/core/common.py in groupby(self, group, squeeze)
    472         core.groupby.DatasetGroupBy
    473         """
--> 474         return self._groupby_cls(self, group, squeeze=squeeze)
    475
    476     def groupby_bins(self, group, bins, right=True, labels=None, precision=3,

/usr/local/lib/python3.6/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    213         group_dim, = group.dims
    214
--> 215         expected_size = obj.sizes[group_dim]
    216         if group.size != expected_size:
    217             raise ValueError('the group variable\'s length does not '

/usr/local/lib/python3.6/site-packages/xarray/core/utils.py in __getitem__(self, key)
    306
    307     def __getitem__(self, key):
--> 308         return self.mapping[key]
    309
    310     def __iter__(self):

KeyError: 'bc'

@ray306
Copy link

ray306 commented Apr 30, 2019

I got a solution which is not so flexible but works:
Data:

import pandas as pd
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two'])]
da = pd.DataFrame(np.random.randn(6, 4)).to_xarray().to_array()
da.coords['index'] = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
da
<xarray.DataArray (variable: 4, index: 6)>
array([[ 0.379189,  1.082292, -2.073478, -0.84626 , -1.529927, -0.837407],
       [-0.267983, -0.2516  , -1.016653, -0.085762, -0.058382, -0.667891],
       [-0.013488, -0.855332, -0.038072, -0.385211, -2.149742, -0.304361],
       [ 1.749561, -0.606031,  1.914146,  1.6292  , -0.515519,  1.996283]])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'bar' 'baz' 'baz' 'foo' 'foo'
  - second    (index) object 'one' 'two' 'one' 'two' 'one' 'two'
  * variable  (variable) int32 0 1 2 3

Groupby on levels:

def multi_groupby(data, levels):
    return (((k0,k1),v1) for k0,v0 in data.groupby(levels[0]) for k1,v1 in v0.groupby(levels[1]))
    
for k,v in multi_groupby(da,['second','variable']):
    print(k,v)
('one', 0) <xarray.DataArray (index: 3)>
array([ 0.379189, -2.073478, -1.529927])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'one' 'one' 'one'
    variable  int32 0
('one', 1) <xarray.DataArray (index: 3)>
array([-0.267983, -1.016653, -0.058382])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'one' 'one' 'one'
    variable  int32 1
('one', 2) <xarray.DataArray (index: 3)>
array([-0.013488, -0.038072, -2.149742])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'one' 'one' 'one'
    variable  int32 2
('one', 3) <xarray.DataArray (index: 3)>
array([ 1.749561,  1.914146, -0.515519])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'one' 'one' 'one'
    variable  int32 3
('two', 0) <xarray.DataArray (index: 3)>
array([ 1.082292, -0.84626 , -0.837407])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'two' 'two' 'two'
    variable  int32 0
('two', 1) <xarray.DataArray (index: 3)>
array([-0.2516  , -0.085762, -0.667891])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'two' 'two' 'two'
    variable  int32 1
('two', 2) <xarray.DataArray (index: 3)>
array([-0.855332, -0.385211, -0.304361])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'two' 'two' 'two'
    variable  int32 2
('two', 3) <xarray.DataArray (index: 3)>
array([-0.606031,  1.6292  ,  1.996283])
Coordinates:
  * index     (index) MultiIndex
  - first     (index) object 'bar' 'baz' 'foo'
  - second    (index) object 'two' 'two' 'two'
    variable  int32 3

@max-sixty
Copy link
Collaborator

The example above is now fixed:

In [1]: ds=xr.Dataset(dict(a=(('z',), np.ones(10))), coords=dict(b=(('z'),np.arange(2).repeat(5)), c=(('z'),np.arange(5).repeat(2))))
   ...:

In [2]: ds.set_index(bc=['b','c']).groupby('bc').sum()
Out[2]:
<xarray.Dataset>
Dimensions:  (bc: 6, z: 10)
Coordinates:
  * bc       (bc) object MultiIndex
  * b        (bc) int64 0 0 0 1 1 1
  * c        (bc) int64 0 1 2 2 3 4
Dimensions without coordinates: z
Data variables:
    a        (bc, z) float64 1.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0   # <- correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants