-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compatibility with xarray #48
Comments
Another taker! See #39 (comment) before it became gh-46. cc: @constantinpape |
Thanks for raising this issue @thewtex. I fully agree that we should strive to be compatible with xarray. To this end, if I understand it correctly, your first proposal would be to add This is very similar to the changes proposed in #46, except that we call the coordinate labels
I am not quite sure what you mean by |
Yes. These are approaches that would facilitate compatibility with a single-scale xarray image dataset sampled on a uniform grid, i.e. the scientific images. We could integrate them piece-by-piece or as a whole. Not listed is how to ensure compatibility with a multi-scale xarray dataset. But, that has not been defined per pydata/xarray#4118 . Ideally, the Xarray community would also be open to compatibility with the NGFF spec with regards to multi-scales.
Nice work! I added a few thoughts.
This was referring to xarray.DataArray.attrs which are encoded as zarr attrs. The orientation of an image does not change across scales, so we do not necessarily need an orientation matrix per-scale. But, it is present so each scale can be treated independently. That said, we may want to store it in array for precision reasons as noted in the transformation discussion.
Thanks for the note, I added a few comments there. The xarray |
Multi-scale data is definitely a goal for Xarray (see pydata/xarray#4118). In fact, the latest CZI EOSS proposal we are planning to submit to the current funding opportunity focuses precisely on this feature. Your feedback on that issue would be welcome and helpful. Particularly interested in how you define "compatibility with the NGFF spec". What would that mean specifically for xarray, besides just supporting generic image pyramids? Obviously Xarray will be reluctant to add very domain-specific features to the core package. However, it is easy to extend xarray. |
Awesome! BTW, @danielballan may have a lot to contribute to this based on his experiences.
I made a note on that issue.
Yes, we want to avoid domain-specific features. On that topic, there is a lot of related discussion in pydata/xarray#1092 regarding netCDF organization. I am not familiar with netCDF, but, of course from the imaging side of things we want to avoid support for unrelated domain-specific features there.
Cool! I can see how this could be used to provide a better xarray-based API for imaging. |
0.3 has added |
It turns out that we are currently not compatible with
can thus not be opened by @thewtex could you maybe follow up with the xarray folks on this? If we want to be compatible supporting different shapes is a must. |
cc: @aurghs |
Xarray has no problem with as many different shapes of arrays in a group as you want. The only catch is that shared dimensions must always be identical, i.e. you cannot have two arrays Going forward, the ongoing work on Xarray Datatree should make this more possible. (I still think the arrays would have to live in different groups.) |
I see.
I don't think that this makes sense for our use-case. These axes are identical and arbitrarily renaming them is unnatural. |
Not to be pendantic, but two intervals sampled at different resolution cannot really be considered "identical". i.e. I think what you really want is native support for multi-scale data in Xarray. You're correct that that doesn't exist (yet). |
In our case these axes don't describe the data input space, but describe the (common) physical output space. So yes, they are identical (but of course this distinction needs to be specified.)
Yes, without having this xarray support does not make much sense in ngff. |
The physical space is indeed shared among all resolutions, but we are rarely interested in such abstract concepts. We just need to define a relationship between these shared global axes (such as world space or global time) and one of the coordinate systems and that's it. All the rest is about transformation between coordinate systems. We care a lot about coordinate systems and their axes, because we need those to interpret the coordinate values, i.e., the values that we store. In the NGFF file format discussion there already seems to be a consensus that we need unique names for coordinate systems. As a result, we already have a unique name for each coordinate system axis. Therefore, xarray and NGFF seems to be very nicely compatible and capable of specifying multi-resolution images. All you need to do is to include the unique coordinate system names in the axis names. For example:
Transforms can be defined using the same unique coordinate system names:
|
This is not quite the state of the current proposal for transformations, please see #94 for details. |
In my test xarray ngff implementation, each scale lived in its own nested group. This satisfies xarray's shape/dimension constraints. For ngff, this results in an additional path component:
Note
This seems reasonable to me. There are additional constraints that need to be satisifed for a ngff dataset to be readable in xarray, and there is duplicated state related to the transforms. I like how the transforms in ngff are moving towards limited duplicated state -- a global transform that applies to all scales, for example. We can make it possible to re-use the bulk pixel array data in a dataset zarr store for both a ngff metadata model and a xarray metadata model, but the metadata for these models will be different. |
I've just posted an update to zarr-developers/zarr-specs#125 that may be of interest to everyone following this issue. The prototype mentioned there is in https://github.com/aurghs/ome-datatree (@aurghs can perhaps also link to the related notebook that she demo'd today). If the outlined solution seems to make sense, I'd suggest we make this issue a proposal of the form:
So that v0.1-v0.4 data of the form:
becomes
and the multiscales metadata would contain:
Older data can continue to be opened (or even upgraded) with specialized backends like https://github.com/aurghs/ome-datatree. |
So each |
Exactly. |
I have pushed the notebook in the repository in |
In discussing with the xarray community, the one change to the NGFF specification that needs to occur to prevent errors being raised when opening a multiscale is for each resolution _array_ to live in a separate _group_. This has already been tested by thewtex in https://github.com/spatial-image/spatial-image-multiscale and the current spec is permissive enough to allow it. The proposal here would enforce the subdirectories moving forward. The conflict in xarray stems from the fact that each of our subresolutions have the same dimension names ("x", "y,", etc.) but different sizes. This is not allowed in the xarray (nor NetCDF) model. An added benefit of this change is that other arrays with the same resolution levels and the same dimensions (e.g. labels!) could be stored together: ``` ├── resolution-N/.zgroup │ ├── image/.zarray │ └── labe/.zarray ```
A good first step would be for xarray to provide a way of opening chunked arrays with user-provided coordinates. Then we can turn NGFF metadata into coordinate DataArrays (i.e. https://github.com/JaneliaSciComp/xarray-ome-ngff/blob/ebcce4876dd9c0ecb3b7a635cea781007e9b24ce/src/xarray_ome_ngff/latest/multiscales.py#L242 ) and strap that onto something like the result of EDIT: This is possibly already available? I'm not sure whether a |
It sounds like this could be a good fit for a custom Xarray backend: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html A custom backend just returns and xarray.Dataset, which can be constructed however you like. As a starting point for prototyping, I would just write a function that produces an Xarray dataset with the correct coordinates / metadata. I'd recommend reviewing the docs on creating a DataArray and creating a Dataset. Let us know how we can help! |
With the aspiration for OME-Zarr to be The One Imaging Format to Rule them All 💍 , I would like to propose compatibility with xarray. For the most part, the needs of the:
overlap. A common, well-supported standard will facilitate integration and cross-pollination across communities, and avoid those I/O headaches 🤯 .
In summary, we could extend the current OME-Zarr spec to be compatible with the result of xarray.Dataset.to_zarr, in a way that adds spatial metadata, addressing #28 #12, through the xarray encoded
coords
using scientific imaging dimensions,x
,y
,z
,c
,t
, standard in OME-Zarr, for the xarray array dimensions, making their name and order explicit #35.Resulting consolidated metadata from idr0094
Created with this script.
In this example, the array dimensions are
y
,x
,c
, i.e. not all 5 dimensions in the current standard, and in a different order. But, these differences could be removed.After attempting a few variations on this and putting it into practice, this seems to work well.
Each scale can be used independently. Initially, I tried to avoid the use of
coords
and use the more concise spatial-dimension rank spacing / scale, origin / translation. However, I found that in an array-based computing environment like scientific Python, where slicing is a bread-and-butter operation, the natural validity of 1D coords that can be sliced is helpful. And, in the development of visualization tools, this is quite handy and avoids on-demand generation.The logic for transforming the spatial metadata is here.
Additionally, there is a growing xarray community, and compatibility helps everyone. Added as an
attr
is a direction / orientation matrix, which is important in medical imaging.I am interested in everyone's thoughts. I am grossly behind on GitHub notifications, but I will check in with the discussion on this issue every day or two.
CC @joshmoore @lassoan @rabernat @constantinpape @danielballan @forman
The text was updated successfully, but these errors were encountered: