Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_lazy for whole AnnData lazy-loading + xarray reading + read_elem_as_dask -> read_elem_lazy #1247

Open
wants to merge 462 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Nov 30, 2023

This PR is a lighter weight version of #947 that involves using the original AnnData object as the class to hold obs and var xr.Dataset.

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: Patch coverage is 91.95876% with 39 lines in your changes missing coverage. Please review.

Project coverage is 84.17%. Comparing base (935cdf2) to head (b46ae71).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/anndata/experimental/backed/_lazy_arrays.py 94.17% 6 Missing ⚠️
src/anndata/tests/helpers.py 73.91% 6 Missing ⚠️
src/anndata/_core/storage.py 50.00% 5 Missing ⚠️
src/anndata/experimental/backed/_io.py 90.00% 5 Missing ⚠️
src/anndata/experimental/backed/_xarray.py 92.53% 5 Missing ⚠️
src/anndata/_io/specs/lazy_methods.py 93.93% 4 Missing ⚠️
src/anndata/_core/index.py 66.66% 3 Missing ⚠️
src/anndata/experimental/backed/_compat.py 84.21% 3 Missing ⚠️
src/anndata/_io/specs/registry.py 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1247      +/-   ##
==========================================
- Coverage   86.11%   84.17%   -1.94%     
==========================================
  Files          40       45       +5     
  Lines        6242     6671     +429     
==========================================
+ Hits         5375     5615     +240     
- Misses        867     1056     +189     
Files with missing lines Coverage Δ
src/anndata/_core/aligned_df.py 95.83% <100.00%> (+0.18%) ⬆️
src/anndata/_core/anndata.py 82.65% <100.00%> (+0.04%) ⬆️
src/anndata/_core/merge.py 85.14% <100.00%> (-8.75%) ⬇️
src/anndata/_core/views.py 85.40% <100.00%> (-5.35%) ⬇️
src/anndata/_io/specs/__init__.py 100.00% <ø> (ø)
src/anndata/_io/specs/methods.py 88.36% <100.00%> (-0.41%) ⬇️
src/anndata/_io/zarr.py 83.75% <100.00%> (+0.20%) ⬆️
src/anndata/_types.py 86.11% <100.00%> (+0.81%) ⬆️
src/anndata/experimental/__init__.py 100.00% <100.00%> (ø)
src/anndata/experimental/backed/__init__.py 100.00% <100.00%> (ø)
... and 10 more

... and 3 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.11.0 milestone Jul 2, 2024
@ilan-gold ilan-gold self-assigned this Jul 2, 2024
@ilan-gold ilan-gold changed the base branch from main to ig/read_dask_elem July 9, 2024 15:44
@ilan-gold ilan-gold mentioned this pull request Jul 10, 2024
3 tasks
Base automatically changed from ig/read_dask_elem to main July 23, 2024 08:39
@ilan-gold
Copy link
Contributor Author

ilan-gold commented Jul 23, 2024

@ivirshup @flying-sheep Not really looking for a thorough code review at the moment, more of a look at the structure of what we are exporting. The big changes are

  1. read_elem_as_dask->read_elem_lazy becomes a more general method focused on supporting the reading of obs and var lazy as xarray objects (although now it can read categoricals, nullables, etc. as a side-effect)
  2. read_backed is exported as a one-stop-shop for reading everything at once backed, as possible

Do we want this way of doing things? Or is there some other route?

Separately, are the changes made to the core acceptable? After that, I think we can look into the specifics of the code I added. Or you can review that now, but I'd rather get big changes out of the way first.

@ilan-gold
Copy link
Contributor Author

I will continue to make little changes to clean things up (this is still a draft!) but I think this structure is the way I would go. But maybe you have different ideas!

from anndata.experimental import read_lazy
from anndata.tests.helpers import assert_equal, gen_adata

from .conftest import ANNDATA_ELEMS, get_key_trackers_for_columns_on_axis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, I thought this doesn’t work. did they change that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What didn't work? Importing from `conftest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants