Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: meta must be an instance of an Awkward Array, not <class 'module'> triggered by fill_flattened for a dask-histogram and dask-awkward arrays #531

Open
NJManganelli opened this issue Jul 29, 2024 · 2 comments
Assignees

Comments

@NJManganelli
Copy link

Trying a hist.dask.Hist object's fill_flattened method with dask-awkward inputs triggers the error seen below. A concrete Hist with the computed awkward arrays as input succeeds. (Version is potentially wrong, I've installed most of scikit-hep packages as development versions, but all from github approximately 3 weeksago)

>>> import dask_awkward as dak
>>> import awkward as ak
>>> import hist
>>> from hist.dask import Hist
>>> d = dak.from_awkward(ak.Array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]), npartitions=2)
>>> e = d.compute()
>>> e
<Array [0.1, 0.2, 0.3, 0.4, 0.5, ..., 0.7, 0.8, 0.9, 1] type='10 * float64'>
>>> dd = dak.from_awkward(ak.Array([[1, 2], [3], [4], [5,6, 7], [8, 4], [2, 3], [1], [], [5, 9, 10], [5, 1]]), npartitions=2)
>>> ak.num(d.compute(), axis=0)
array(10)
>>> ak.num(dd.compute(), axis=0)
array(10)
>>> hd = Hist(hist.axis.Regular(10, 0, 10), hist.storage.Weight())
>>> hd.fill_flattened(dd, weight=d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/basehist.py", line 274, in fill_flattened
    destructured = interop.destructure(arg)
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/interop.py", line 65, in destructure
    for module in find_histogram_modules(obj):
  File "/Users/nmangane/scikit-hep-dev/hist/src/hist/interop.py", line 48, in find_histogram_modules
    yield arg._histogram_module_
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1596, in __getattr__
    return self.map_partitions(
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1649, in map_partitions
    return map_partitions(func, self, *args, traverse=traverse, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 2168, in map_partitions
    return _map_partitions(
           ^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 2036, in _map_partitions
    return new_array_object(
           ^^^^^^^^^^^^^^^^^
  File "/Users/nmangane/scikit-hep-dev/dask-awkward/src/dask_awkward/lib/core.py", line 1849, in new_array_object
    raise TypeError(
TypeError: meta must be an instance of an Awkward Array, not <class 'module'>.

>>> he = hist.hist.Hist(hist.axis.Regular(10, 0, 10), hist.storage.Weight())
>>> he.fill_flattened(dd.compute(), weight=d.compute())
Hist(Regular(10, 0, 10, label='Axis 0'), storage=Weight()) # Sum: WeightedSum(value=8.6, variance=5.96) (WeightedSum(value=9.5, variance=6.77) with flow)
>>> import dask_histogram
>>> dask_histogram.__version__
'2024.3.0'
@agoose77 agoose77 self-assigned this Jul 30, 2024
@agoose77
Copy link
Collaborator

agoose77 commented Jul 30, 2024

This error derives from the fact that we don't implement support for the fill-flattened method in dask-awkward.

A rough implementation looks like:

import awkward as ak
import dask_awkward as dak
class _DaskAwkwardHistModule:
    def unpack(array):
        if not ak.fields(array):
            return None
        return dict(zip(ak.fields(array), ak.unzip(array)))
    def broadcast_and_flatten(args):
        new_args = []
        for arg in args:
            if isinstance(arg, dak.Array):
                new_args.append(arg)
            elif isinstance(arg, dask.Array):
                new_args.append(dak.from_dask_array(arg))
            else:
                new_args.append(dak.from_awkward(ak.Array(arg, backend="numpy")))
        assert not any([x.fields for x in new_args])
        return tuple(
          ak.flatten(x, axis=None) for x in ak.broadcast_arrays(*new_args)
        )

dak.Array._histogram_module_ = _DaskAwkwardHistModule

I'm not sure how smart that will be with non-equally-partitioned collections.

@martindurant
Copy link
Collaborator

@agoose77 , were you thinking to submit that? We could put a restriction on partitions, and reasonable messaging if that's not the case.

NJManganelli pushed a commit to NJManganelli/coffea that referenced this issue Feb 6, 2025
NJManganelli pushed a commit to NJManganelli/coffea that referenced this issue Feb 17, 2025
…edSelection class

Adjust result object to not break unpacking of cutflow result, restoring one test; need dask-contrib/dask-awkward#531

Remove unnecessary return Cutflow

Add boolean_masks_to_categorical_integers for optimization of histogram fill in cutflow

WIP add the optimized version of filling cutflows, with a testable v2 variable temprorarily, and new categorical option for additional axis (needs documentation and example)

Rename slice to slc

WIP weighted cutflow with commonmask, categorical axes, plot_vars still needs much work and cleanup

Untested plot_vars function with commonmask and categorical support, using broadcast_arrays func

Fixes for original tests to pass

WIP - fixes from tests, expanded print header

revert change to masks in compute for npz

Fix commonmask handling of initial weighted event counts

Some logic fixes for weights in npz file

Fix for v2 eager mode

Handle delayed cutflow to_npz pathway, Weights does not have a compute method

black src/coffea/analysis_tools.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants