Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image -> Arrow support #8330

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

Image -> Arrow support #8330

wants to merge 50 commits into from

Conversation

wiredfool
Copy link
Member

@wiredfool wiredfool commented Aug 25, 2024

Partial implementation of #8329.

This is for discussion -- there are a number of issues.

Changes proposed in this pull request:

  • Implements basic __arrow_c_array__ and __arrow_c_schema__ support for reading Pillow images in Arrow capable libraries.
  • Implements basic arrow array exporting object -> Pillow image. This supports reading arrays in exactly the same formats that we export from Pillow.
  • Adds a way to force the internal allocator to use single blocks for images. Previously there is core.new_block, but internal operations call ImageNewDirty internally, which only calls the Arena Allocator.
  • Adds C level read-only flag. Currently only exported to python layer, and overrides python read_only.

Issues:

  • Currently fails on large images.
    • Returns a ValueError if the image is allocated in multiple blocks
  • There's extraneous stuff in Arrow.c that should be cleaned up.
  • Some changes in Storage.c aren't required.
  • Unclear if we're exporting the most useful version of the data.

Todo:

  • Alternate binary installable arrow lib for testing, or loopback only testing on troublesome platforms if we can't find one that works.
  • Fix Windows Segfault. I don't have a windows env here, so either I need to figure out a CI based version or I'll need an assist.
    • Windows tests now appear to be passing on 3.10->3.13, with 3.9, pypy3.10 and 3.14 failing on arrow install.
  • Python accessible version (r/w, likely) of the schema capsule and possibly array metadata. Partially for testing, partially for sanity checking at the python layer before passing into the C portion.
  • Fix the UNDONEs in the code, especially around error handling.
    • make sure we're raising an error if we don't accept the array format.
  • Large image support or error out.
  • Wire up enforcement of c level read_only flag
  • Docs

For Review:

  • What CVE is in here that I'm not seeing?
  • There's some pretty serious mucking with the object lifetimes, specifically there's now effectively a refcount on the core C image, so it's lifetime is not limited to the python object's lifetime. I think this is required, but it needs review.
  • I'm not 100% on how so much of the checking is in Storage.c, vs somewhere closer to the surface.
  • The core Imaging item now stores a reference to the array pycapsule, which requires PyINCREF and PyDECREF in Storage.c. This feels like it's finally breaking the wall of libImaging not relying on python headers.

* Fixed format, only for 4 channel images
* structs can't be encoded this way, they need to have one child array
per struct member.
* i.e. struct of arrays, rather than an array of structs.
* basic safety included
* only respecting the types we emit
@wiredfool wiredfool force-pushed the arrow branch 3 times, most recently from c10beff to 97eb7c0 Compare January 21, 2025 21:41
@wiredfool
Copy link
Member Author

Added basic fromarrow support, covering exactly the formats that we emit, and testing using round trip support.

Windows is failing, so that's something that's going to need digging. And somehow codecov is failing before most of the runs have gone, so not sure what's up there.

@wiredfool
Copy link
Member Author

Windows AMD64 is failing on pillow arrow tests as well as the python version windows matrix:

Tests/test_arrow.py::test_to_array[L-dtype0-None] DEBUG:   25+ if ( >>>> !$?) { exit $LASTEXITCODE }

Windows x86, Windows pypy, manylinux wheels and MacOS Arm64 and 10.15 X86_64 are failing on pyarrow install/compilation, instead of installing the prebuilt wheel. (I'm not sure why MacOS Arm64 is failing, as that's my dev platform). We're going to have to look at either compile time prerequisites, a lighter library with lots of binaries, or potentially skipping tests if pyarrow doesn't compile.

wiredfool and others added 4 commits January 25, 2025 13:42
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
Co-authored-by: Andrew Murray <[email protected]>
@wiredfool
Copy link
Member Author

We're down to lint here, and I'm not sure how to make mypy narrow it's type. Testing Locally with this code:

            if mask:
                for ix, elt in enumerate(mask):
                    assert isinstance(px[x, y], tuple) # type forcing
                    assert px[x, y][ix] == arr[y * img.width + x].as_py()[elt]

mypy fails. It's apparently supposed to be able to use asserts to narrow the type.

Tests/test_pyarrow.py:29: error: Value of type "Union[float, tuple[int, ...]]" is not indexable  [index]
                        assert px[x, y][ix] == arr[y * img.width + x].as_py()[elt]
                               ^~~~~~~~~~~~
Found 1 error in 1 file (checked 296 source files)
mypy: exit 1 (3.36 seconds) /Users/erics/src/Pillow> mypy conftest.py selftest.py setup.py docs src winbuild Tests pid=99747
  mypy: FAIL code 1 (3.40=setup[0.04]+cmd[3.36] seconds)
  evaluation failed :( (3.48 seconds)

It's also unable to narrow this:

            if mask:
                for ix, elt in enumerate(mask):
                    # type narrowing, only required for mypy
                    if isinstance(px[x, y], tuple):
                        assert px[x, y][ix] == arr[y * img.width + x].as_py()[elt]

@wiredfool
Copy link
Member Author

Looks like mypy can only handle type narrowing of a variable, not an expression, even though it knows the value of the expression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants