(feat): `read_lazy` for whole `AnnData` lazy-loading + `xarray` reading + `read_elem_as_dask` -> `read_elem_lazy` #1247

ilan-gold · 2023-11-30T12:51:54Z

This PR is a lighter weight version of #947 that involves using the original AnnData object as the class to hold obs and var xr.Dataset.

Closes Dask and Zarr not loading obsp and obsm from remote s3 #951 and closes lazy dataframes in .obs and .var with backed="r" mode #981
Tests added
Release note added (or unnecessary)

codecov · 2023-12-07T16:24:55Z

Codecov Report

Attention: Patch coverage is 91.95876% with 39 lines in your changes missing coverage. Please review.

Project coverage is 84.17%. Comparing base (935cdf2) to head (b46ae71).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/anndata/experimental/backed/_lazy_arrays.py	94.17%	6 Missing ⚠️
src/anndata/tests/helpers.py	73.91%	6 Missing ⚠️
src/anndata/_core/storage.py	50.00%	5 Missing ⚠️
src/anndata/experimental/backed/_io.py	90.00%	5 Missing ⚠️
src/anndata/experimental/backed/_xarray.py	92.53%	5 Missing ⚠️
src/anndata/_io/specs/lazy_methods.py	93.93%	4 Missing ⚠️
src/anndata/_core/index.py	66.66%	3 Missing ⚠️
src/anndata/experimental/backed/_compat.py	84.21%	3 Missing ⚠️
src/anndata/_io/specs/registry.py	88.23%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
- Coverage   86.11%   84.17%   -1.94%     
==========================================
  Files          40       45       +5     
  Lines        6242     6671     +429     
==========================================
+ Hits         5375     5615     +240     
- Misses        867     1056     +189

Files with missing lines	Coverage Δ
src/anndata/_core/aligned_df.py	`95.83% <100.00%> (+0.18%)`	⬆️
src/anndata/_core/anndata.py	`82.65% <100.00%> (+0.04%)`	⬆️
src/anndata/_core/merge.py	`85.14% <100.00%> (-8.75%)`	⬇️
src/anndata/_core/views.py	`85.40% <100.00%> (-5.35%)`	⬇️
src/anndata/_io/specs/__init__.py	`100.00% <ø> (ø)`
src/anndata/_io/specs/methods.py	`88.36% <100.00%> (-0.41%)`	⬇️
src/anndata/_io/zarr.py	`83.75% <100.00%> (+0.20%)`	⬆️
src/anndata/_types.py	`86.11% <100.00%> (+0.81%)`	⬆️
src/anndata/experimental/__init__.py	`100.00% <100.00%> (ø)`
src/anndata/experimental/backed/__init__.py	`100.00% <100.00%> (ø)`
... and 10 more

... and 3 files with indirect coverage changes

ilan-gold · 2024-07-23T13:05:18Z

@ivirshup @flying-sheep Not really looking for a thorough code review at the moment, more of a look at the structure of what we are exporting. The big changes are

read_elem_as_dask->read_elem_lazy becomes a more general method focused on supporting the reading of obs and var lazy as xarray objects (although now it can read categoricals, nullables, etc. as a side-effect)
read_backed is exported as a one-stop-shop for reading everything at once backed, as possible

Do we want this way of doing things? Or is there some other route?

Separately, are the changes made to the core acceptable? After that, I think we can look into the specifics of the code I added. Or you can review that now, but I'd rather get big changes out of the way first.

ilan-gold · 2024-07-23T13:05:59Z

I will continue to make little changes to clean things up (this is still a draft!) but I think this structure is the way I would go. But maybe you have different ideas!

…/xarray_compat

Co-authored-by: Philipp A. <[email protected]>

…ring`

…/xarray_compat

flying-sheep · 2025-02-20T12:41:53Z

tests/lazy/test_concat.py

+from anndata.experimental import read_lazy
+from anndata.tests.helpers import assert_equal, gen_adata
+
+from .conftest import ANNDATA_ELEMS, get_key_trackers_for_columns_on_axis


wait, I thought this doesn’t work. did they change that?

What didn't work? Importing from `conftest?

…exing ops

ilan-gold mentioned this pull request Nov 30, 2023

Dask and Zarr not loading obsp and obsm from remote s3 #951

Open

ilan-gold mentioned this pull request Jan 31, 2024

lazy dataframes in .obs and .var with backed="r" mode #981

Open

ilan-gold added this to the 0.11.0 milestone Jul 2, 2024

ilan-gold self-assigned this Jul 2, 2024

ilan-gold added the skip-gpu-ci label Jul 5, 2024

ilan-gold force-pushed the ig/xarray_compat branch from 68fcd2b to 6165f07 Compare July 5, 2024 14:16

ilan-gold changed the base branch from main to ig/read_dask_elem July 9, 2024 15:44

ilan-gold mentioned this pull request Jul 10, 2024

(feat): read_elem_as_dask method #1469

Merged

3 tasks

flying-sheep and others added 11 commits July 22, 2024 16:26

add wrapper

dce9f07

move into type checking

2725ef2

(fix): small type fxes

ffe89f0

Merge branch 'main' into ig/read_dask_elem

6cb231e

Merge branch 'ig/read_dask_elem' into ig/xarray_compat

029add9

block info types

75a64fc

simplify

3f734fe

rename

c4c2356

simplify more

cc67a9b

(fix): migrate to use read_elem infrastructure

fcb1763

Merge branch 'ig/read_dask_elem' into ig/xarray_compat

adcd48a

Base automatically changed from ig/read_dask_elem to main July 23, 2024 08:39

ilan-gold added 5 commits July 23, 2024 10:45

Merge branch 'main' into ig/xarray_compat

2a72ec0

(fix): no first access of categories

4c659a1

(fix): last small cleanups

d3a811a

(fix): try not runnign xarray tests

e852a74

(fix): oops! forgot one test to mark!

8c92a41

ilan-gold requested review from ivirshup and flying-sheep July 23, 2024 13:03

ilan-gold and others added 25 commits February 17, 2025 17:21

(chore): add comment about processes

802fafb

(feat): add nullable-string support

f314da7

Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…

7769f85

…/xarray_compat

(fix): test name

93ef3f1

(chore): add a doc string

00cc32d

(fix): clarify categories read behavior

aa79243

(chore): doc

b304edc

(chore): ellipses for shortened docstring test

7d03d36

Apply suggestions from code review

9f143e5

Co-authored-by: Philipp A. <[email protected]>

Merge branch 'ig/dask_bound' into ig/xarray_compat

6f149df

centralize dev setup

b07d225

"useEditorconfig": true

908cb98

Merge branch 'main' into ig/xarray_compat

1cece74

Merge branch 'pa/centralize-dev-setup' into ig/xarray_compat

ff928cf

(fix): dont coerce awkward

d14ffa9

(fix): materialize dask array

3733dba

(feat): finish handling of nullable-string-array

c58f0cc

(fix): silence warning by using StringDtype directly instead of `st…

19973b0

…ring`

(refactor): simplify xarray dim -- pandas index interplay

d7435c0

(chore): rename test files

5317718

(refactor): clarify test_concat_to_memory_var

ac144f9

(fix): throw away zeros for numpy-backed dask array

7728063

Merge branch 'main' into ig/xarray_compat

29b5914

(chore): loosen restriction on merged part

b8080f9

Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…

99acfc5

…/xarray_compat

flying-sheep reviewed Feb 20, 2025

View reviewed changes

ilan-gold added 4 commits February 20, 2025 14:18

(feat): drop requirement that indices must match

1b9fab3

(fix): mark tests properly

4e6bb60

(fix): string dtype issues

f25afa1

(fix): ensure dummy indices are string typed instead of range for ind…

b46ae71

…exing ops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat): `read_lazy` for whole `AnnData` lazy-loading + `xarray` reading + `read_elem_as_dask` -> `read_elem_lazy` #1247

(feat): `read_lazy` for whole `AnnData` lazy-loading + `xarray` reading + `read_elem_as_dask` -> `read_elem_lazy` #1247

ilan-gold commented Nov 30, 2023 •

edited by ivirshup

Loading

codecov bot commented Dec 7, 2023 •

edited

Loading

ilan-gold commented Jul 23, 2024 •

edited

Loading

ilan-gold commented Jul 23, 2024

flying-sheep Feb 20, 2025

ilan-gold Feb 20, 2025

(feat): read_lazy for whole AnnData lazy-loading + xarray reading + read_elem_as_dask -> read_elem_lazy #1247

Are you sure you want to change the base?

(feat): read_lazy for whole AnnData lazy-loading + xarray reading + read_elem_as_dask -> read_elem_lazy #1247

Conversation

ilan-gold commented Nov 30, 2023 • edited by ivirshup Loading

codecov bot commented Dec 7, 2023 • edited Loading

Codecov Report

ilan-gold commented Jul 23, 2024 • edited Loading

ilan-gold commented Jul 23, 2024

flying-sheep Feb 20, 2025

Choose a reason for hiding this comment

ilan-gold Feb 20, 2025

Choose a reason for hiding this comment

(feat): `read_lazy` for whole `AnnData` lazy-loading + `xarray` reading + `read_elem_as_dask` -> `read_elem_lazy` #1247

(feat): `read_lazy` for whole `AnnData` lazy-loading + `xarray` reading + `read_elem_as_dask` -> `read_elem_lazy` #1247

ilan-gold commented Nov 30, 2023 •

edited by ivirshup

Loading

codecov bot commented Dec 7, 2023 •

edited

Loading

ilan-gold commented Jul 23, 2024 •

edited

Loading