Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove or replace call to weakref / support serdes #60

Closed
evamaxfield opened this issue Dec 22, 2020 · 14 comments · Fixed by #61
Closed

Remove or replace call to weakref / support serdes #60

evamaxfield opened this issue Dec 22, 2020 · 14 comments · Fixed by #61

Comments

@evamaxfield
Copy link
Contributor

evamaxfield commented Dec 22, 2020

Hey @tlambert03 !

Back again, one of the checks we do on aicsimageio is to ensure that the AICSImage / the base Reader object can be fully serialized. This is largely useful for dask worker transfer (while it's not recommended to transfer a whole image between workers, it is in some cases useful, i.e. it is decently efficient to transfer a not-yet-read image between workers for metadata parsing / some image handling).

Using the produced OME object from from_xml is awesome but unfortunately it can't be serialized due to this weakref call / attr creator.

I have gotten our tests working locally by simply removing this __post_init_post_parse__ function addition (or really just removing the entire OME object CLASS_OVERRIDE), but, it is there so I am here to ask: "Why?"

Would it possible to remove this? I also know that the __getstate__ and __setstate__ can be used to override default pickle calls if that is an option I could potentially work on a patch for adding those functions to allow pickling and unpickling of the OME object.

@tlambert03
Copy link
Owner

Hey @JacksonMaxfield! Yep, I totally agree, it should be easy to pickle or otherwise serialize these things

it is there so I am here to ask: "Why?"

The weakrefs were introduced in #34 as a (rather nifty) way to resolve OME Reference types to the actual object that they reference. The ultimate goal was to enable something like #27 (comment), and here's an example of how it works:

In [80]: from ome_types import to_xml

In [81]: ome = from_xml('testing/data/two-screens-two-plates-four-wells.ome.xml')

# Screens have references to multiple plates
In [82]: ome.screens[0]
Out[82]:
Screen(
   id='Screen:1',
   name='',
   plate_ref=[<2 Plate_Ref>],
   protocol_description='',
   protocol_identifier='',
   reagent_set_description='',
   reagent_set_identifier='',
   reagents=[<1 Reagents>],
   type='',
)

# but the reference itself is boring :/
In [83]: ome.screens[0].plate_ref[0]
Out[83]: PlateRef(id='Plate:1')

# the weakref allowed this:
In [84]: ome.screens[0].plate_ref[0].ref
Out[84]:
Plate(
   id='Plate:1',
   description='Plate 1 description.',
   plate_acquisitions=[<2 Plate_Acquisitions>],
   wells=[<4 Wells>],
)

however, they're not as nifty as serializing! 😄 so let's fix it. We already test round-tripping with the ome_types.to_xml and from_xml:

In [87]: from_xml(to_xml(ome)) == ome
Out[87]: True

so, I see a few options:

  • you can pickle to_xml(ome) instead of ome itself (not pretty... but it's a workaround in the meantime):
    In [5]: from_xml(pickle.loads(pickle.dumps(to_xml(ome)))) == ome
    Out[5]: True
  • we can add a __getstate__ method that just throws away the ref_ weakref attribute from __dict__, but then the deserialized object will not have the weakrefs (and will fail equality checks: ome == pickle.loads(pickle.dumps(ome)))
  • we can dispense with the weakrefs altogether, and just make those ref_ pointers actual pointers... That would retain the referencing behavior even after serdes. I need @jmuhlich to weigh in on that one though, I'm not sure I can think through all the repercussions there

thoughts? @jmuhlich, @JacksonMaxfield

@tlambert03
Copy link
Owner

actually... after looking at how dill serializes weakrefs, I see there's a better variant of option number 2 using __getstate__: we can just dereference the weakref before serializing and re-reference upon deserializing. Just tested this out and it seems to work:

# in ome_types.dataclasses... 

def modify_get_state(_cls: Type[Any]) -> None:
    if hasattr(_cls, "__getstate__"):
        return

    def __getstate__(self: Any) -> Dict[str, Any]:
        if "ref_" in self.__dict__:
            d = self.__dict__.copy()
            d["ref_"] = d.pop("ref_")()
            return d
        return self.__dict__

    def __setstate__(self: Any, state: Dict[str, Any]) -> None:
        self.__dict__.update(state)
        if "ref_" in self.__dict__:
            self.__dict__["ref_"] = weakref.ref(self.__dict__.pop("ref_"))

    _cls.__getstate__ = __getstate__
    _cls.__setstate__ = __setstate__

...
# and then add modify_get_state(cls) down in the `ome_dataclass` `def wraps` function

@evamaxfield
Copy link
Contributor Author

Woah lots of updates and super useful info!

I totally get what the weakrefs are doing now and that is very neat.

however, they're not as nifty as serializing! smile so let's fix it. We already test round-tripping with the ome_types.to_xml and from_xml

This is great for 98% of cases its the 2% of cases like mine where we attach the OME object to the Reader object of aicsimageio and test if the whole reader can be serialized or not, in doing so we don't really have control over how the attached attrs serialize, hence this issue haha. But glad that you agree that 2% case is worth it!

I knew there had to be an implementation of __getstate__ that works. When I was tinkering with it I was pretty confused as to how to handle it all.

@tlambert03
Copy link
Owner

let me know if you have a moment to try out #61 on your files/models. would be curious to see if the tests errors are real, or just an issue with the tests

@tlambert03
Copy link
Owner

(oh, and since I'm not sure it's documented anywhere, if you do check it out, make sure to run python -m src.ome_autogen to update the model code)

@evamaxfield
Copy link
Contributor Author

It is documented! As a warning or something. Found it when I was working on the __getstate__ stuff myself.

@evamaxfield
Copy link
Contributor Author

Trying out your branch now

@evamaxfield
Copy link
Contributor Author

All tests pass for our Reader with the changes in #61!

Nice work @tlambert03

@tlambert03
Copy link
Owner

Thanks for the quick test!

@jmuhlich
Copy link
Collaborator

jmuhlich commented Dec 23, 2020 via email

@evamaxfield
Copy link
Contributor Author

Thanks for the fix @tlambert03!

Any chance for a patch release prior to the XMLSchema bug fix next just so that I can pin something to my deps?

@tlambert03
Copy link
Owner

sure!

@tlambert03
Copy link
Owner

all set: https://pypi.org/project/ome-types/

@evamaxfield
Copy link
Contributor Author

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants