rdataframe to awkward #1448

ianna · 2022-04-27T14:11:51Z

No description provided.

codecov · 2022-04-27T14:23:55Z

Codecov Report

Merging #1448 (db498f1) into main (de7cae8) will decrease coverage by 0.57%.
The diff coverage is 67.67%.

Impacted Files	Coverage Δ
src/awkward/_v2/_connect/cling.py	`26.82% <0.00%> (ø)`
src/awkward/_v2/_connect/numexpr.py	`88.40% <ø> (ø)`
...awkward/_v2/_connect/rdataframe/from_rdataframe.py	`0.00% <0.00%> (ø)`
...c/awkward/_v2/_connect/rdataframe/to_rdataframe.py	`0.00% <0.00%> (ø)`
src/awkward/_v2/operations/ak_backend.py	`10.00% <0.00%> (ø)`
src/awkward/_v2/operations/ak_from_arrow.py	`100.00% <ø> (ø)`
src/awkward/_v2/operations/ak_from_arrow_schema.py	`50.00% <ø> (ø)`
src/awkward/_v2/operations/ak_from_buffers.py	`86.13% <ø> (ø)`
src/awkward/_v2/operations/ak_from_cupy.py	`50.00% <ø> (ø)`
src/awkward/_v2/operations/ak_from_jax.py	`50.00% <ø> (ø)`
... and 113 more

ianna · 2022-04-29T08:05:05Z

@jpivarski - would the following guarantee that the result does not go away? It's in ROOT src/bindings/pyroot_legacy/_rdf_utils.py

import numpy


class ndarray(numpy.ndarray):
    """
    A wrapper class that inherits from numpy.ndarray and allows to attach the
    result pointer of the `Take` action in an `RDataFrame` event loop to the
    collection of values returned by that action. See
    https://docs.scipy.org/doc/numpy/user/basics.subclassing.html for more
    information on subclassing numpy arrays.
    """
    def __new__(cls, numpy_array, result_ptr):
        """
        Dunder method invoked at the creation of an instance of this class. It
        creates a numpy array with an `RResultPtr` as an additional
        attribute.
        """
        obj = numpy.asarray(numpy_array).view(cls)
        obj.result_ptr = result_ptr
        return obj

    def __array_finalize__(self, obj):
        """
        Dunder method that fills in the instance default `result_ptr` value.
        """
        if obj is None: return
        self.result_ptr = getattr(obj, "result_ptr", None)

jpivarski · 2022-04-29T17:12:36Z

That class doesn't do anything to attach itself to a RDataSource by itself, but presumably NumPy arrays require special handling because np.ndarray is an extension type. By subclassing it, they've made an equivalent of an np.ndarray that would work around any issues due to np.ndarray not being an ordinary Python type. Our arrays and Lookup are ordinary Python classes.

But you're right that RNumpyDS is going to have exactly the same problem that we do: the data is owned by a NumPy object, so that NumPy object has to be kept in scope as long as the RNumpyDS is. Here's the pointer:

https://github.com/root-project/root/blob/4f9759c604d05a8304bf0b55f5738a479441e151/bindings/pyroot/pythonizations/inc/RNumpyDS.hxx#L60-L63

and maybe that PyObject* is actually an ndarray subclass, rather than NumPy's own np.ndarray, for technical reasons, but what's important for us is that they do hold such an object.

The RNumpyDS does Py_INCREF and Py_DECREF the fPyRVecs with no special handling for the Python GIL:

https://github.com/root-project/root/blob/4f9759c604d05a8304bf0b55f5738a479441e151/bindings/pyroot/pythonizations/inc/RNumpyDS.hxx#L129-L150

There are no other references to fPyRVecs in the RNumpyDS implementation. If this is the right way to do it, then we should do the same. Our RDataSources run under the same conditions as RNumpyDS; we should be doing the same memory management.

If this is right, then I think that means that ROOT functions that might create or delete RNumpyDS instances are holding the Python GIL. That surprises me, since it would prevent speed-ups in parallel processing. Maybe it's okay because the parallel processing only happens in a fork-join pattern in which the RNumpyDS is never created or destroyed?

~~It's still worth asking~~ As it turns out, @etejedor has already answered my question here: #1446 (comment). I'm just getting the messages out of order. Okay, so it is safe to Py_INCREF and Py_DECREF freely in the constructor and destructor, without any special GIL-handling, because they aren't called from __release_gil__ functions.

ianna · 2022-05-04T14:08:52Z

@jpivarski - it looks like the failing pre-commit is unrelated to this PR;

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

src/awkward/config.py:99:5: T201 print found.
src/awkward/_v2/_connect/numba/arrayview.py:16:9: T201 print found.
src/awkward/_v2/_connect/numba/arrayview.py:17:9: T201 print found.
src/awkward/_connect/_numba/arrayview.py:20:9: T201 print found.
src/awkward/_connect/_numba/arrayview.py:21:9: T201 print found.

jpivarski · 2022-05-04T15:04:51Z

src/awkward/config.py:99:5: T201 print found.


We should get rid of anything related to "config.py". It's for compiling dependents, which is something we're giving up on on v2.

> ```shell
> src/awkward/_v2/_connect/numba/arrayview.py:16:9: T201 print found.
> src/awkward/_v2/_connect/numba/arrayview.py:17:9: T201 print found.
> src/awkward/_connect/_numba/arrayview.py:20:9: T201 print found.
> src/awkward/_connect/_numba/arrayview.py:21:9: T201 print found.
> ```

There shouldn't be print statements in these files. Remove them.

How did they get there and why is this showing up now? I have no idea.

ianna · 2022-05-04T17:08:52Z

```shell
src/awkward/config.py:99:5: T201 print found.
We should get rid of anything related to "config.py". It's for compiling dependents, which is something we're giving up on on v2.
src/awkward/_v2/_connect/numba/arrayview.py:16:9: T201 print found.
src/awkward/_v2/_connect/numba/arrayview.py:17:9: T201 print found.
src/awkward/_connect/_numba/arrayview.py:20:9: T201 print found.
src/awkward/_connect/_numba/arrayview.py:21:9: T201 print found.
There shouldn't be print statements in these files. Remove them.

How did they get there and why is this showing up now? I have no idea.

I think, print here is not ignored anymore:

def code_to_function(code, function_name, externals=None, debug=False):
    if debug:
        print("################### " + function_name)  # noqa: T001
        print(code)  # noqa: T001

jpivarski · 2022-05-04T17:12:42Z

Apparently, what happened is that the number changed from T001 to T201. I would have thought that those numbers wouldn't change—why else would they be short codes like that? Oh well. Since this might be run under different versions of flake8-print, it's probably safest to ignore both:

# noqa: T001,T201

I don't know what the new "T001" is, but we're only losing the protection of this test on a few isolated lines, so it's probably fine.

…it-hep/awkward into ianna/rdataframe-to-awkward

ianna · 2022-05-17T14:05:12Z

replaced by #1474

ianna marked this pull request as draft April 27, 2022 14:11

jpivarski mentioned this pull request Apr 29, 2022

awkward rdataframe source tests #1446

Merged

ianna added 20 commits May 4, 2022 16:00

rdataframe to awkward

3ba7f5c

start on from_rdataframe function

4ce861c

convert std::vectors to ak.Arrays

f37404a

std::vectors of primitive types as records

3e9f8cd

add nested types test

84c2814

add is_iterable type check

3e91c3c

add a nested rvec test

dac3211

add to and from test, complex numbers test

3431c68

wrap as array

92e047b

pre-commit fixes

1a786ac

check if the array content is contiguous

62420ec

formatting fixes

c517245

add recursive function

8120682

pre-commit fixes

72ae485

cleanup

2663856

cleanup

c5b4cfa

pre-commit fixes

08e0f2b

add nested list offset arrays test

9571e97

add a test for a variant

531f8b2

pre-commit fixes

9e1ac32

ianna force-pushed the ianna/rdataframe-to-awkward branch from 692b33d to 9e1ac32 Compare May 4, 2022 14:01

ianna and others added 26 commits May 17, 2022 12:15

check if the array content is contiguous

2b440cd

formatting fixes

6e1d281

add recursive function

73de4d7

pre-commit fixes

8408cd1

cleanup

1d8a3a3

cleanup

20105c9

pre-commit fixes

e84042b

add nested list offset arrays test

b56362b

add a test for a variant

1540710

pre-commit fixes

5fc337c

fix complex type test

8f64215

Trivial change to try to trigger CI tests.

162566f

Another trivial change to try to trigger CI tests.

16dc248

Once more just to be sure, now that I'm done messing with the settings.

00adb8c

pass cpp pointer directly to Array constructor

99a52fa

rdataframe to awkward

ed469bb

Another trivial change to try to trigger CI tests.

3b5de00

Once more just to be sure, now that I'm done messing with the settings.

3194fce

enable one more test

01499cf

rebase

47d91f9

Another trivial change to try to trigger CI tests.

714590d

Once more just to be sure, now that I'm done messing with the settings.

d46bfff

Another trivial change to try to trigger CI tests.

53d26dd

Once more just to be sure, now that I'm done messing with the settings.

eb51c1a

Merge branch 'ianna/rdataframe-to-awkward' of https://github.com/scik…

ef647dd

…it-hep/awkward into ianna/rdataframe-to-awkward

fix merge conflict

db498f1

ianna mentioned this pull request May 17, 2022

rdataframe to awkward #1474

Merged

ianna closed this May 17, 2022

jpivarski deleted the ianna/rdataframe-to-awkward branch September 23, 2022 00:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdataframe to awkward #1448

rdataframe to awkward #1448

ianna commented Apr 27, 2022

codecov bot commented Apr 27, 2022 •

edited

Loading

ianna commented Apr 29, 2022

jpivarski commented Apr 29, 2022

ianna commented May 4, 2022

jpivarski commented May 4, 2022

ianna commented May 4, 2022 •

edited

Loading

jpivarski commented May 4, 2022

ianna commented May 17, 2022

rdataframe to awkward #1448

rdataframe to awkward #1448

Conversation

ianna commented Apr 27, 2022

codecov bot commented Apr 27, 2022 • edited Loading

Codecov Report

ianna commented Apr 29, 2022

jpivarski commented Apr 29, 2022

ianna commented May 4, 2022

jpivarski commented May 4, 2022

ianna commented May 4, 2022 • edited Loading

jpivarski commented May 4, 2022

ianna commented May 17, 2022

codecov bot commented Apr 27, 2022 •

edited

Loading

ianna commented May 4, 2022 •

edited

Loading