Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_lazy for whole AnnData lazy-loading + xarray reading + read_elem_as_dask -> read_elem_lazy #1247

Open
wants to merge 464 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
464 commits
Select commit Hold shift + click to select a range
ffe89f0
(fix): small type fxes
ilan-gold Jul 22, 2024
6cb231e
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 22, 2024
029add9
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 22, 2024
75a64fc
block info types
flying-sheep Jul 22, 2024
3f734fe
simplify
flying-sheep Jul 22, 2024
c4c2356
rename
flying-sheep Jul 22, 2024
cc67a9b
simplify more
flying-sheep Jul 22, 2024
fcb1763
(fix): migrate to use `read_elem` infrastructure
ilan-gold Jul 23, 2024
adcd48a
Merge branch 'ig/read_dask_elem' into ig/xarray_compat
ilan-gold Jul 23, 2024
2a72ec0
Merge branch 'main' into ig/xarray_compat
ilan-gold Jul 23, 2024
4c659a1
(fix): no first access of categories
ilan-gold Jul 23, 2024
d3a811a
(fix): last small cleanups
ilan-gold Jul 23, 2024
e852a74
(fix): try not runnign `xarray` tests
ilan-gold Jul 23, 2024
8c92a41
(fix): oops! forgot one test to mark!
ilan-gold Jul 23, 2024
47be954
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 5, 2024
55f706f
Update pyproject.toml
ilan-gold Aug 6, 2024
6fa97f0
(fix): change unused category function from method to function
ilan-gold Aug 6, 2024
9e2e21d
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 6, 2024
eb1237c
(fix): actually track keys instead of relying on `deafultdict` behavior
ilan-gold Aug 6, 2024
6724c62
(chore): test unconsolidated warning
ilan-gold Aug 6, 2024
53796a0
Update pyproject.toml
ilan-gold Aug 6, 2024
076b92f
(fix): use `test-full`/`test`
ilan-gold Aug 6, 2024
036ff3f
(fix): typing for `_gen_dataframe`
ilan-gold Aug 6, 2024
9415a14
(chore): imrpoved comments for `Dataset2D`
ilan-gold Aug 6, 2024
b5dfaac
(fix): `iloc` is an `attr` not a `meth`
ilan-gold Aug 6, 2024
d45a2ce
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 13, 2024
cff41c4
(fix): release notes
ilan-gold Aug 13, 2024
3ccbfaf
(fix): `zarr` doc in `read_backed`
ilan-gold Aug 13, 2024
ff4d487
(fix): docs string
ilan-gold Aug 13, 2024
ed8fedf
(fix): wording in release note
ilan-gold Aug 13, 2024
e0e8891
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 26, 2024
3325f38
(chore): move `_remove_unused_categories` to static method
ilan-gold Aug 27, 2024
528026f
(chore): use one `isinstance` call in `coerce_arrays`
ilan-gold Aug 27, 2024
aa0d161
(chore): clean up `read_dataframe`
ilan-gold Aug 27, 2024
41e3038
(chore): handle case where `chunks` is not needed
ilan-gold Aug 27, 2024
dc5c6e6
(chore): make reusable `LazyDataStructures`
ilan-gold Aug 27, 2024
4edd279
(chore): use `Path.suffix`
ilan-gold Aug 27, 2024
969c6af
(chore): `msg` for `warnings`
ilan-gold Aug 27, 2024
2a31ab8
(chore): remove erroneous `Union` in `TypeVar`
ilan-gold Aug 27, 2024
2521ff8
(fix): use `cached_property` for accessing `dtype` + test
ilan-gold Aug 27, 2024
628f9fc
(refactor): use `cached_property` for `categories`
ilan-gold Aug 27, 2024
ff9412a
(refactor): use guard clause in `__getitem__` better
ilan-gold Aug 27, 2024
36d57be
(chore): type `get_index_dim`
ilan-gold Aug 27, 2024
51610b1
(fix): `shape` return type
ilan-gold Aug 27, 2024
ba8d147
(refactor): `_subset` guard clause
ilan-gold Aug 27, 2024
2cf1262
(fix): use `Counter`
ilan-gold Aug 27, 2024
b1feb6f
(refactor): `fix_known_differences` usage of `as_type`
ilan-gold Aug 27, 2024
02741f5
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 27, 2024
24e8970
Merge branch 'main' into ig/xarray_compat
ilan-gold Aug 28, 2024
ab3e718
(chore): fragment
ilan-gold Aug 28, 2024
4412710
(chore): fix the generic problem
ilan-gold Aug 28, 2024
d3401b2
(chore): clean up tests
ilan-gold Aug 28, 2024
ef0bbf3
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 30, 2024
3b6d194
Update tests/test_read_backed_experimental.py
ilan-gold Aug 30, 2024
4587fec
Merge branch 'main' into ig/xarray_compat
flying-sheep Aug 30, 2024
97eace5
(fix): should -> shall
ilan-gold Sep 2, 2024
58654a8
Apply suggestions from code review
ilan-gold Sep 2, 2024
67af64f
(fix): `_gen_xarray_dict_iterator_from_elems` -> `_gen_xarray_dict_it…
ilan-gold Sep 2, 2024
bfc2e73
(feat): indexing with `DataArray`
ilan-gold Sep 2, 2024
b801d35
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 2, 2024
a1d0b89
(fix): check h5 store
ilan-gold Sep 4, 2024
160b522
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 4, 2024
5f80b61
(fix):check `DataArray` closer
ilan-gold Sep 4, 2024
8ce6409
(fix): clean up `api.md` from merge
ilan-gold Sep 4, 2024
f9ef9f0
(fix): remove `read_elem_as_dask` docs reference
ilan-gold Sep 4, 2024
5e69a50
(chore): add notebooks/read_backed_experimental
ilan-gold Sep 4, 2024
1540d27
(chore): update notebooks
ilan-gold Sep 4, 2024
371fc2b
(refactor): set `pytestmark` at the top
ilan-gold Sep 18, 2024
5cb2d8d
(chore): clarify comment
ilan-gold Sep 18, 2024
b86ee6b
(refactor): add `assert_access_count` method for `AccessTrackingStore`
ilan-gold Sep 18, 2024
227a3c6
(refactor): `read_backed`->`read_lazy`
ilan-gold Sep 18, 2024
3debf9b
(fix): actually `read_backed` -> `read_lazy`
ilan-gold Sep 18, 2024
4b48988
(chore): time to require `aiohttp` `fsspec` and `zarr` and `requests`
ilan-gold Sep 18, 2024
28c95a7
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 18, 2024
cec633a
(chore): update notebook
ilan-gold Sep 18, 2024
97aff04
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 18, 2024
bf710d0
(fix): actually only read `index` once
ilan-gold Sep 20, 2024
1dfebde
(chore): add `concat` test
ilan-gold Sep 23, 2024
d355ed0
(feat): add `columns` compat
ilan-gold Sep 23, 2024
cfae08a
(fix): type of subset
ilan-gold Sep 23, 2024
1c46ec6
(fix): `MaskedArray` `dtype
ilan-gold Sep 23, 2024
f491724
(fix): add `index.setter`
ilan-gold Sep 23, 2024
411bd91
(chore): add `concat` compat for xarray
ilan-gold Sep 23, 2024
7f89eb3
(fix): refactor concat for in-memory
ilan-gold Sep 23, 2024
d77ba37
(chore): add rest of test
ilan-gold Sep 23, 2024
096f2c6
(feat): allow for concat with masked type using dask
ilan-gold Sep 30, 2024
3c7c627
(refactor): own function for concat xarray
ilan-gold Sep 30, 2024
74b1940
(fix): add basic off-axis mapping without reading in i.e just an index
ilan-gold Sep 30, 2024
04206b8
(feat): add merge for alt annot
ilan-gold Sep 30, 2024
a7cbbd8
(fix): notebook
ilan-gold Sep 30, 2024
36bc262
Merge branch 'main' into ig/xarray_compat
ilan-gold Sep 30, 2024
af4520a
(fix): NaS in sparse dask
ilan-gold Sep 30, 2024
fa7358f
(chore): add `X` tracker
ilan-gold Sep 30, 2024
4db8c1b
(chore): add more robust matrix type tests
ilan-gold Sep 30, 2024
eefbee6
(fix): ok now notebooks?
ilan-gold Sep 30, 2024
39f2838
(feat): fix additional index load
ilan-gold Oct 1, 2024
dcca711
(feat): no-load index
ilan-gold Oct 1, 2024
d3121b7
(fix): only use range indices for `{obs,var}`
ilan-gold Oct 1, 2024
b25e8ba
(chore): add range index testing
ilan-gold Oct 1, 2024
870a4f2
(fix): ensure `{obs,var}_names` always exists on dataset
ilan-gold Oct 1, 2024
c2681e0
(fix): only use `use_range_index` on `{obs,var}`
ilan-gold Oct 1, 2024
e7a915b
(fix): don't check uniqueness to prevent index load + rename variable
ilan-gold Oct 1, 2024
f93499d
(fix): check for presence of indexing key
ilan-gold Oct 1, 2024
63a4515
always return `index` object
ilan-gold Oct 1, 2024
b1c8c22
(fix): remove unnecessary check?
ilan-gold Oct 1, 2024
d082a35
(chore): update notebook
ilan-gold Oct 1, 2024
e81d155
(fix): docstring class
ilan-gold Oct 1, 2024
c8c5271
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 1, 2024
c4d0146
(fix): explicit 1d chunking for `concat`
ilan-gold Oct 1, 2024
b34ac0a
(chore): change `concat` test name
ilan-gold Oct 1, 2024
1f4ab92
(chore): rename test file
ilan-gold Oct 1, 2024
d25f559
(chore): rename notebook
ilan-gold Oct 1, 2024
c661d39
(chore): clarify `load_annotation_index`
ilan-gold Oct 1, 2024
bff63cb
(fix): allow concatenation along arbitrary index + indexing test
ilan-gold Oct 1, 2024
784ea9b
(fix): notebook path in docs
ilan-gold Oct 1, 2024
9b2d9a3
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 1, 2024
489cc8d
(fix): don't copy column in dataset concat
ilan-gold Oct 2, 2024
24f11be
(fix): actually test h5
ilan-gold Oct 2, 2024
57bcfd9
(feat): add h5ad concat support
ilan-gold Oct 2, 2024
8b07f43
(fix): add docsting example
ilan-gold Oct 2, 2024
408d62a
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Oct 2, 2024
cb125bf
(fix): catch warnings
ilan-gold Oct 2, 2024
e14f53f
(fix): format
ilan-gold Oct 2, 2024
3c5641c
(fix): threaded tests annotation index
ilan-gold Oct 2, 2024
dbe09ca
(fix): remove xarray test from minimum deps
ilan-gold Oct 2, 2024
67fc546
(fix): skip experimental backed tests if xarray not installed
ilan-gold Oct 2, 2024
41a9335
(chore): remove todo
ilan-gold Oct 2, 2024
58e595b
(feat): add `uns` reading
ilan-gold Oct 4, 2024
8875374
(feat): add `Raw` reading + tests
ilan-gold Oct 4, 2024
4c991d4
(chore): make `test-full` shorter
ilan-gold Oct 4, 2024
50cdc66
(fix): stricter type checking
ilan-gold Oct 4, 2024
eb881a9
(fix): dtype casting for concat
ilan-gold Oct 4, 2024
fe1f0a6
(chore): separate into two cleaner unit tests
ilan-gold Oct 4, 2024
c0c0c6c
(fix): typing of `make_xarray_extension_dtypes_dask
ilan-gold Oct 4, 2024
fc72011
(chore): remove comment
ilan-gold Oct 4, 2024
562817d
(fix): clean up compat
ilan-gold Oct 4, 2024
9ef7cf5
(chore): xarray raises import error for `read_lazy`
ilan-gold Oct 4, 2024
69b6cc1
(fix) ignore some uncovered lines
flying-sheep Oct 14, 2024
3dfefc4
use existing warn import
flying-sheep Oct 14, 2024
1eb440e
merge
ilan-gold Oct 16, 2024
fe77a5c
(chore): add awkward `nitpick_ignore` comment
ilan-gold Oct 16, 2024
a654421
(refactor): use generator for new datasets
ilan-gold Oct 16, 2024
015bdca
(chore): docs + types in `merge.py`
ilan-gold Oct 16, 2024
7201bad
(recactor): use set for `index_name` in `merge.py`
ilan-gold Oct 16, 2024
5d813c0
(refactor): comprehension for `{alt_}annotations_in_memory`
ilan-gold Oct 16, 2024
b50e8ad
(chore): types in `lazy_methods.py`
ilan-gold Oct 16, 2024
3ca669c
(chore): `lazy_methods.py` index handling made clearer
ilan-gold Oct 16, 2024
7ea20df
(chore): move comment
ilan-gold Oct 16, 2024
91fdb90
(chore): dedupe `read_params` usage
ilan-gold Oct 16, 2024
a71dad8
(chore): `**kwargs` usage doesn't affect call when empty
ilan-gold Oct 16, 2024
0ac61c6
(fix): clean up `_lazy_arrays.py` typing
ilan-gold Oct 16, 2024
af5c2fe
(fix): no assert, raise ValueError
ilan-gold Oct 16, 2024
edac279
(fix): use `get` instead of membership check + no in-place
ilan-gold Oct 16, 2024
cdd9b89
(chore): pytest mark to beginning + `diskfmt` -> `diskfmt` + thread s…
ilan-gold Oct 16, 2024
e07426a
(fix): remove resetting key trackers
ilan-gold Oct 16, 2024
8278e0f
(fix): adata only has 4 cchunks in test, udpate comment
ilan-gold Oct 16, 2024
30d1bb1
(chore): better use arrange-act-assert
ilan-gold Oct 16, 2024
509af7f
(chore): ids for boolean params
ilan-gold Oct 16, 2024
58122a1
(chore): contextlib + better assert objects
ilan-gold Oct 16, 2024
e6fea74
(chore): refactor concatenation for arrange-act-assert
ilan-gold Oct 16, 2024
bd509a1
merge again?
ilan-gold Oct 16, 2024
62cda13
(fix): notebook submodule
ilan-gold Oct 16, 2024
4e1a1f6
(fix): use `find_spec` pattern
ilan-gold Oct 17, 2024
a242dea
(chore): re-insert types for `AccessTrackingStore`
ilan-gold Oct 17, 2024
07caf93
(chore): dedent docstrings
ilan-gold Oct 17, 2024
8fd1fa0
(chore): raise error if slots have changed on `ZarrOrHDF5Wrapper`
ilan-gold Oct 17, 2024
94cf8ea
(fix): add slots to please xarray
ilan-gold Oct 17, 2024
f01818a
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 17, 2024
aca24db
Merge branch 'main' into ig/xarray_compat
flying-sheep Oct 17, 2024
2c082bf
(chore): remove redefinition
ilan-gold Oct 17, 2024
81c5fb9
(refactor): reuse join type
ilan-gold Oct 18, 2024
bb49dd2
(fix): mixed type dataframe merging
ilan-gold Oct 18, 2024
942661f
(fix): condition for going to memory in mixed typing
ilan-gold Oct 21, 2024
99219c6
(refactor): mixed type helper function
ilan-gold Oct 21, 2024
98197fe
(fix): try linking to dask/awkward in docs build
ilan-gold Oct 21, 2024
752e02b
(fix): awkward array docs
ilan-gold Oct 21, 2024
2a38900
(chore): `ValueError` -> `AssertionError`
ilan-gold Oct 25, 2024
cc40369
(fix): clean up `_lazy_arrays.py`
ilan-gold Oct 25, 2024
a807673
(fix): `ValueError`->`KeyError` for store
ilan-gold Oct 25, 2024
852ab20
(chore): add note about `unify_extension_dtypes`
ilan-gold Oct 25, 2024
310191c
(chore): add ids
ilan-gold Oct 25, 2024
1c15b70
Apply suggestions from code review
ilan-gold Oct 31, 2024
b96bd55
(fix): move all changes form anndata_elem
ilan-gold Oct 31, 2024
a663c5d
Merge branch 'main' into ig/xarray_compat
ilan-gold Oct 31, 2024
d1fce7e
(fix): `read_elem_as_dask`->`read_elem_lazy`
ilan-gold Oct 31, 2024
52b6a01
(chore): refactor `test_read_lazy` fixtures
ilan-gold Oct 31, 2024
a796d9b
Update tests/test_read_lazy.py
ilan-gold Oct 31, 2024
be786d0
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Oct 31, 2024
e48377a
(chore): restore types
ilan-gold Oct 31, 2024
90f6d77
(fix): do `randint`
ilan-gold Oct 31, 2024
8e29713
(chore): ermove slots check
ilan-gold Oct 31, 2024
f13bfb4
(fix): return read_lazy
ilan-gold Oct 31, 2024
1643da6
(fix): concating with hdf5 and cluster obviates need for locks works …
ilan-gold Nov 6, 2024
5013823
(fix): add warning catch
ilan-gold Nov 7, 2024
4c00216
Merge branch 'main' into ig/xarray_compat
ilan-gold Nov 7, 2024
8b95aff
Merge branch 'main' into ig/xarray_compat
ilan-gold Nov 28, 2024
dee82a2
(chore): good chunk size
ilan-gold Nov 28, 2024
843cae8
(chore): remove duplicated line
ilan-gold Nov 28, 2024
fa41e35
(chore): use a simpler read mechanism for remote io
ilan-gold Nov 28, 2024
1ab575c
(fix): docs
ilan-gold Nov 28, 2024
0c777ef
(chore): docs note
ilan-gold Nov 28, 2024
fc13a2f
Merge branch 'main' into ig/xarray_compat
ilan-gold Nov 28, 2024
bfa5fc9
Merge branch 'main' into ig/xarray_compat
ilan-gold Jan 27, 2025
0b93086
(fix): missing element shape
ilan-gold Jan 27, 2025
a045853
(fix): correct shape for empty element in concat with dask
ilan-gold Jan 27, 2025
c9253a9
(chore): add test
ilan-gold Jan 27, 2025
8b2e208
(fix): handling of off axis size when no-dask array
ilan-gold Jan 28, 2025
edbd7bc
Merge branch 'main' into ig/xarray_compat
ilan-gold Jan 29, 2025
207a89c
Merge branch 'ig/fix_missing_elem_shape_dask' into ig/xarray_compat
ilan-gold Jan 29, 2025
7166a3c
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Jan 29, 2025
61babe8
(fix): ruff
ilan-gold Jan 29, 2025
b643d9a
Merge branch 'main' into ig/xarray_compat
ilan-gold Feb 4, 2025
6db4255
Merge branch 'main' into ig/xarray_compat
ilan-gold Feb 17, 2025
42baa7a
(fix): add `dask` bound for sparse issue
ilan-gold Feb 17, 2025
3adde1a
(chore): relnote
ilan-gold Feb 17, 2025
e225caa
(fix): undo scientific python changes
ilan-gold Feb 17, 2025
021d8f9
(fix): no `mod` dask
ilan-gold Feb 17, 2025
aa62f5f
Apply suggestions from code review
ilan-gold Feb 17, 2025
df2e518
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 17, 2025
802fafb
(chore): add comment about processes
ilan-gold Feb 17, 2025
f314da7
(feat): add `nullable-string` support
ilan-gold Feb 17, 2025
7769f85
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Feb 17, 2025
93ef3f1
(fix): test name
ilan-gold Feb 17, 2025
00cc32d
(chore): add a doc string
ilan-gold Feb 17, 2025
aa79243
(fix): clarify `categories` read behavior
ilan-gold Feb 17, 2025
b304edc
(chore): doc
ilan-gold Feb 17, 2025
7d03d36
(chore): ellipses for shortened docstring test
ilan-gold Feb 17, 2025
9f143e5
Apply suggestions from code review
ilan-gold Feb 17, 2025
6f149df
Merge branch 'ig/dask_bound' into ig/xarray_compat
ilan-gold Feb 18, 2025
b07d225
centralize dev setup
flying-sheep Feb 19, 2025
908cb98
"useEditorconfig": true
flying-sheep Feb 19, 2025
1cece74
Merge branch 'main' into ig/xarray_compat
ilan-gold Feb 19, 2025
ff928cf
Merge branch 'pa/centralize-dev-setup' into ig/xarray_compat
ilan-gold Feb 19, 2025
d14ffa9
(fix): dont coerce awkward
ilan-gold Feb 19, 2025
3733dba
(fix): materialize dask array
ilan-gold Feb 19, 2025
c58f0cc
(feat): finish handling of `nullable-string-array`
ilan-gold Feb 19, 2025
19973b0
(fix): silence warning by using `StringDtype` directly instead of `st…
ilan-gold Feb 19, 2025
d7435c0
(refactor): simplify xarray `dim` -- pandas `index` interplay
ilan-gold Feb 19, 2025
5317718
(chore): rename test files
ilan-gold Feb 19, 2025
ac144f9
(refactor): clarify `test_concat_to_memory_var`
ilan-gold Feb 19, 2025
7728063
(fix): throw away zeros for numpy-backed dask array
ilan-gold Feb 19, 2025
29b5914
Merge branch 'main' into ig/xarray_compat
ilan-gold Feb 19, 2025
b8080f9
(chore): loosen restriction on merged part
ilan-gold Feb 20, 2025
99acfc5
Merge branch 'ig/xarray_compat' of github.com:scverse/anndata into ig…
ilan-gold Feb 20, 2025
1b9fab3
(feat): drop requirement that indices must match
ilan-gold Feb 20, 2025
4e6bb60
(fix): mark tests properly
ilan-gold Feb 20, 2025
f25afa1
(fix): string dtype issues
ilan-gold Feb 20, 2025
b46ae71
(fix): ensure dummy indices are string typed instead of range for ind…
ilan-gold Feb 24, 2025
f0c182d
(chore): raise `TypeError` not `ValueError`
ilan-gold Feb 25, 2025
51021dd
Merge branch 'main' into ig/xarray_compat
ilan-gold Feb 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ jobs:
path: $(uv_cache_dir)
displayName: Cache pip packages

- script: uv pip install --system --compile "anndata[dev,test] @ ." -c ci/constraints.txt
- script: uv pip install --system --compile "anndata[dev,test-full] @ ." -c ci/constraints.txt
displayName: "Install dependencies"
condition: eq(variables['DEPENDENCIES_VERSION'], 'latest')

Expand All @@ -61,7 +61,7 @@ jobs:
displayName: "Install minimum dependencies"
condition: eq(variables['DEPENDENCIES_VERSION'], 'minimum')

- script: uv pip install -v --system --compile --pre "anndata[dev,test] @ ." -c ci/constraints.txt
- script: uv pip install -v --system --compile --pre "anndata[dev,test-full] @ ." -c ci/constraints.txt
displayName: "Install dependencies release candidates"
condition: eq(variables['DEPENDENCIES_VERSION'], 'pre-release')

Expand All @@ -72,6 +72,10 @@ jobs:
displayName: "PyTest"
condition: eq(variables['TEST_TYPE'], 'standard')

- script: pytest
displayName: "PyTest (minimum)"
condition: eq(variables['DEPENDENCIES_VERSION'], 'minimum')

- script: pytest --cov --cov-report=xml --cov-context=test
displayName: "PyTest (coverage)"
condition: eq(variables['TEST_TYPE'], 'coverage')
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ __pycache__/
/*cache/
/node_modules/
/data/
/venv/

# Distribution / packaging
/dist/
Expand Down
6 changes: 5 additions & 1 deletion docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,8 @@ Low level methods for reading and writing elements of an {class}`AnnData` object
.. autosummary::
:toctree: generated/

experimental.read_elem_as_dask
experimental.read_elem_lazy
experimental.read_lazy
```

Utilities for customizing the IO process:
Expand All @@ -156,6 +157,9 @@ Types used by the former:
experimental.ReadCallback
experimental.WriteCallback
experimental.StorageType
experimental.backed._lazy_arrays.MaskedArray
experimental.backed._lazy_arrays.CategoricalArray
experimental.backed._xarray.Dataset2D
```

## Errors and warnings
Expand Down
2 changes: 1 addition & 1 deletion docs/release-notes/0.11.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Release candidates:
- {guilabel}`rc1` Allow `axis` parameter of e.g. {func}`anndata.concat` to accept `'obs'` and `'var'` {user}`flying-sheep` ({pr}`1244`)
- {guilabel}`rc1` Add `settings` object with methods for altering internally-used options, like checking for uniqueness on `obs`' index {user}`ilan-gold` ({pr}`1270`)
- {guilabel}`rc1` Add {attr}`~anndata.settings.remove_unused_categories` option to {attr}`anndata.settings` to override current behavior {user}`ilan-gold` ({pr}`1340`)
- {guilabel}`rc1` Add {func}`~anndata.experimental.read_elem_as_dask` function to handle i/o with sparse and dense arrays {user}`ilan-gold` ({pr}`1469`)
- {guilabel}`rc1` Add `~anndata.experimental.read_elem_as_dask` function to handle i/o with sparse and dense arrays {user}`ilan-gold` ({pr}`1469`)
- {guilabel}`rc1` Add ability to convert strings to categoricals on write in {meth}`~anndata.AnnData.write_h5ad` and {meth}`~anndata.AnnData.write_zarr` via `convert_strings_to_categoricals` parameter {user}` falexwolf` ({pr}`1474`)
- {guilabel}`rc1` Add {attr}`~anndata.settings.check_uniqueness` option to {attr}`anndata.settings` to override current behavior {user}`ilan-gold` ({pr}`1507`)
- {guilabel}`rc1` Add functionality to write from GPU {class}`dask.array.Array` to disk {user}`ilan-gold` ({pr}`1550`)
Expand Down
2 changes: 1 addition & 1 deletion docs/release-notes/0.11.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@
### Bug fixes

- Remove upper pin on `dask` and exclude versions broken with sparse indexing {user}`ilan-gold` ({pr}`1725`)
- Fix chunking with -1 in `chunks` argument of {func}`~anndata.experimental.read_elem_as_dask` {user}`ilan-gold` ({pr}`1743`)
- Fix chunking with -1 in `chunks` argument of `~anndata.experimental.read_elem_as_dask` {user}`ilan-gold` ({pr}`1743`)
- Fix `cupy<0.13` imports in non-gpu environments {user}`ilan-gold` ({pr}`1754`)
1 change: 1 addition & 0 deletions docs/release-notes/1247.feature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add {func}`~anndata.experimental.read_elem_lazy` (in place of `read_elem_as_dask`) to handle backed dataframes, sparse arrays, and dense arrays, as well as a {func}`~anndata.experimental.read_lazy` to handle reading in as much of the on-disk data as possible to produce a {class}`~anndata.AnnData` object {user}`ilan-gold`
1 change: 1 addition & 0 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ notebooks/anncollection-annloader
notebooks/anndata_dask_array
notebooks/awkward-arrays
notebooks/{read,write}_dispatched
notebooks/read_lazy
```
2 changes: 1 addition & 1 deletion docs/tutorials/notebooks
Submodule notebooks updated 2 files
+8 −2 .readthedocs.yml
+3,280 −0 read_lazy.ipynb
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,11 @@ doc = [
"sphinx_design>=0.5.0",
"readthedocs-sphinx-search",
# for unreleased changes
"anndata[dev-doc]",
"anndata[dev-doc,dask]",
"awkward>=2.3"
]
dev-doc = ["towncrier>=24.8.0"] # release notes tool
test-full = ["anndata[test,lazy]"]
Copy link
Contributor Author

@ilan-gold ilan-gold Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a different name? Not sure about test-full

test = [
"loompy>=3.0.5",
"pytest>=8.2,<8.3.4",
Expand All @@ -108,6 +110,7 @@ gpu = ["cupy"]
cu12 = ["cupy-cuda12x"]
cu11 = ["cupy-cuda11x"]
# https://github.com/dask/dask/issues/11290
lazy = ["xarray>=2024.06.0", "aiohttp", "requests", "zarr<3.0.0a0", "anndata[dask]"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do about optional deps here....I think fsspec should be added as well to make it feature complete, or we should remove all of them except xarray

Copy link
Member

@flying-sheep flying-sheep Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why requests though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for "remote" data with zarr<3 so I figured I'd throw it in. Same thing with aiohttp. Both could be removed but that would produce an error immediately upon people using zarr v2 with this and remote data.

dask = ["dask[array]>=2022.09.2,!=2024.8.*,!=2024.9.*"]

[tool.hatch.version]
Expand Down
19 changes: 17 additions & 2 deletions src/anndata/_core/aligned_df.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

import warnings
from collections.abc import Mapping
from functools import singledispatch
from typing import TYPE_CHECKING

Expand All @@ -10,13 +11,27 @@
from .._warnings import ImplicitModificationWarning

if TYPE_CHECKING:
from collections.abc import Iterable, Mapping
from collections.abc import Iterable
from typing import Any, Literal


@singledispatch
def _gen_dataframe(
anno: Mapping[str, Any],
anno: Any,
index_names: Iterable[str],
*,
source: Literal["X", "shape"],
attr: Literal["obs", "var"],
length: int | None = None,
) -> pd.DataFrame: # pragma: no cover
msg = f"Cannot convert {type(anno)} to {attr} DataFrame"
raise ValueError(msg)


@_gen_dataframe.register(Mapping)
@_gen_dataframe.register(type(None))
def _gen_dataframe_mapping(
anno: Mapping[str, Any] | None,
index_names: Iterable[str],
*,
source: Literal["X", "shape"],
Expand Down
15 changes: 8 additions & 7 deletions src/anndata/_core/anndata.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from collections import OrderedDict
from collections.abc import Mapping, MutableMapping, Sequence
from copy import copy, deepcopy
from functools import partial
from functools import partial, singledispatch
from pathlib import Path
from textwrap import dedent
from typing import TYPE_CHECKING, cast
Expand Down Expand Up @@ -43,7 +43,6 @@
from .sparse_dataset import BaseCompressedSparseDataset, sparse_dataset
from .storage import coerce_array
from .views import (
DataFrameView,
DictView,
_resolve_idxs,
as_view,
Expand All @@ -55,7 +54,7 @@
from typing import Any, Literal

from ..compat import Index1D
from ..typing import ArrayDataStructureType
from ..typing import XDataType
from .aligned_mapping import AxisArraysView, LayersView, PairwiseArraysView
from .index import Index

Expand Down Expand Up @@ -291,8 +290,8 @@ def _init_as_view(self, adata_ref: AnnData, oidx: Index, vidx: Index):
self._remove_unused_categories(adata_ref.obs, obs_sub, uns)
self._remove_unused_categories(adata_ref.var, var_sub, uns)
# set attributes
self._obs = DataFrameView(obs_sub, view_args=(self, "obs"))
self._var = DataFrameView(var_sub, view_args=(self, "var"))
self._obs = as_view(obs_sub, view_args=(self, "obs"))
self._var = as_view(var_sub, view_args=(self, "var"))
self._uns = uns

# set data
Expand Down Expand Up @@ -540,7 +539,7 @@ def shape(self) -> tuple[int, int]:
return self.n_obs, self.n_vars

@property
def X(self) -> ArrayDataStructureType | None:
def X(self) -> XDataType | None:
"""Data matrix of shape :attr:`n_obs` × :attr:`n_vars`."""
if self.isbacked:
if not self.file.is_open:
Expand Down Expand Up @@ -1023,8 +1022,10 @@ def __getitem__(self, index: Index) -> AnnData:
oidx, vidx = self._normalize_indices(index)
return AnnData(self, oidx=oidx, vidx=vidx, asview=True)

@staticmethod
@singledispatch
def _remove_unused_categories(
self, df_full: pd.DataFrame, df_sub: pd.DataFrame, uns: dict[str, Any]
df_full: pd.DataFrame, df_sub: pd.DataFrame, uns: dict[str, Any]
):
for k in df_full:
if not isinstance(df_full[k].dtype, pd.CategoricalDtype):
Expand Down
17 changes: 12 additions & 5 deletions src/anndata/_core/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,13 @@
| pd.Index,
index: pd.Index,
) -> slice | int | np.ndarray: # ndarray of int or bool
if not isinstance(index, pd.RangeIndex):
msg = "Don’t call _normalize_index with non-categorical/string names"
assert index.dtype != float, msg
assert index.dtype != int, msg
from ..experimental.backed._compat import DataArray

# TODO: why is this here? All tests pass without it and it seems at the minimum not strict enough.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this comment. This line was causing problems with the load_annotation_index=True case, if I remember. But all tests pass without this

Copy link
Member

@flying-sheep flying-sheep Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts don’t necessary exist in runtime code (can be optimized out). So if whoever wrote them knows what they’re doing, asserts in runtime code are purely there to make debugging easier in case the asserts fail.

Ruff has a check to disallow asserts, we should probably activate it for non-test code and replace them with raise AssertionError(msg) (which isn’t optimized out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it would be good to understand what we even going on here @ivirshup ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if the tests pass, we're good? And if we're not good, we should add a test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asserts in runtime code are purely there to make debugging easier in case the asserts fail.

Hmm, so it’s not possible that _normalize_index ever gets called with the wrong type of index as result of user action?

If that’s impossible, feel free to remove. Otherwise we should probably update this check to a TypeError or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

# if not isinstance(index, pd.RangeIndex):
# msg = "Don’t call _normalize_index with non-categorical/string names and non-range index"
# assert index.dtype != float, msg
# assert index.dtype != int, msg

# the following is insanely slow for sequences,
# we replaced it using pandas below
Expand Down Expand Up @@ -110,8 +113,12 @@
)
raise KeyError(msg)
return positions # np.ndarray[int]
elif isinstance(indexer, DataArray):
if isinstance(indexer.data, DaskArray):
return indexer.data.compute()
return indexer.data
msg = f"Unknown indexer {indexer!r} of type {type(indexer)}"
raise IndexError(msg)
raise IndexError()

Check warning on line 121 in src/anndata/_core/index.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/_core/index.py#L121

Added line #L121 was not covered by tests


def _fix_slice_bounds(s: slice, length: int) -> slice:
Expand Down
Loading
Loading