-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to force by-value serialization of classes? #424
Comments
Thanks for the pointer. Squinting at that code, I don't think that's quite what we're after. (Though note I am not an expert on pickling, and am still relatively confused about what Line 1398 in 4bdc5df
to modules beside |
…`__main__` The underlying library `dill` that we use for serializing dynamic providers into Pulumi state for Python dynamic providers serializes classes differently depending on whether they are in `__main__` or in another module. We need the by-value serialization to be applied in all cases. uqfoundation/dill#424 is tracking adding the ability into `dill` to specify this by-value serialization explicitly, but until then, we will temporarily re-write the `__module__` of a provder class prior to serialization, so that `dill` behaves as we need for the dynamic provider use case. Fixes #7453.
…`__main__` (#7755) The underlying library `dill` that we use for serializing dynamic providers into Pulumi state for Python dynamic providers serializes classes differently depending on whether they are in `__main__` or in another module. We need the by-value serialization to be applied in all cases. uqfoundation/dill#424 is tracking adding the ability into `dill` to specify this by-value serialization explicitly, but until then, we will temporarily re-write the `__module__` of a provder class prior to serialization, so that `dill` behaves as we need for the dynamic provider use case. Fixes #7453.
I'm having similar issues. I've left a couple minimal examples to reproduce the issue in #375 which appears to be a duplicate. |
@mmckerns In #463, @leogama made a breaking change to It might be worthwhile moving |
I remember seeing the breaking change in #463, and meant to complain about it but apparently forgot... I disagree that it means that no downstream library depends on |
Leaving the function there but moving the code over would be a way to get this to work while not breaking code that uses def _locate_function(obj, pickler=None):
if pickler and is_dill(pickler, obj):
return pickler._byref_selector(obj)
return Pickler._byref_selector(pickler, obj)
class Pickler(StockPickler):
...
def _byref_selector(pickler, obj):
if obj.__module__ in ['__main__', None] or \
pickler and pickler._session and obj.__module__ == pickler._main.__name__:
return False
found = _import_module(obj.__module__ + '.' + obj.__name__, safe=True)
return found is obj Although a cleaner interface might be desired, or otherwise this will become a common boilerplate pattern: class SelectivePickler(dill.Pickler):
def __init__(self, *args, **kwargs):
super().__init__(self, *args, **kwargs)
self._byref_names = collections.defaultdict(list)
def _byref_selector(self, obj):
for names in self._byref_names[obj.__module__]:
if names is None:
return False
for name in names:
if obj.__qualname__ == name or obj.__qualname__.startswith(name + '.'):
return False
return super()._byref_selector(obj)
def register_byref(self, obj):
if isinstance(obj, types.ModuleType):
self._byref_names[obj.__name__] = None
else:
self._byref_names[obj.__module__].append(obj.__qualname__) |
@anivegesana, in my defense... Just kidding. 😄 It's my first experience on tweaking the internals of a broadly used open source project (and Python's ultra-dynamical nature doesn't help). It never occurred to me that changing such a simple private function could possibly break a downstream library —and I hope it didn't. Is anything really private in |
It isn't fully clear to me either. I just go out of my way to not remove any parameters and make minimal changes to the code, even if there are ways that would be easier to understand, just so that whatever was there originally stays there (hence why I didn't notice that #466 was a bug; I had simply left it like I found it.) This is why I think we should have a clear standard about which functions are public and which ones are private (preferably following PEP 8) so that it would be easier to remove cruft and improve the library. We should expose any functions that the public can use in the main dill package or the other non- |
Well, I think it's more complex than that. When |
EditedIt's like having
By the way, I indeed remember checking if |
I would add that under |
@anivegesana and @mmckerns: I was thinking about how to solve another issue yesterday and realized that I probably could have changed the function Lines 1063 to 1067 in 5bd56a8
to this: class Pickler(StockPickler):
# ...
def _locate_function(self, obj, session=False):
if obj.__module__ in ['__main__', None] or \
self is not None and self._session and obj.__module__ == self._main.__name__:
return False
found = _import_module(obj.__module__ + '.' + obj.__name__, safe=True)
return found is obj
_locate_function = functools.partial(Pickler._locate_function, None) and then simply prepending self._locate_function(obj) I leave the suggestion to implement new private functions used only for pickling xor unpickling, that at first don't depend on external state, as Edit:Actually the above suggestion is a bad idea, as it would expose such private functions to subclasses (check the updated API hierarchy in my previous comment, thanks to @anivegesana's comment). Edit 2:@anivegesana, just now I read your comment above with a similar solution for not breaking |
@mmckerns, would you mind to add this, with any modifications, to the |
I found two ways to achieve this:
Both approaches effectively trigger dill to serialize the object by value, as if it were defined in Word of caution: While I have not found any downsides to resetting Tested versions:
Tested objects:
Python versions:
[1] https://oegedijk.github.io/blog/pickle/dill/python/2020/11/10/serializing-dill-references.html Line 1066 in 5bd56a8
|
Rereading... I noticed that it was not mentioned that any class can be serialized by value or sorts with To answer the original question, yes, it's feasible to have a setting that essentially ignores the result of |
@mmckerns, would you be up for adding a feature to dill that allows forcing a particular class to be serialized by value rather than by reference? Specifying
byref=False
only applies to a limited set of cases at the moment.The specific motivation is pulumi/pulumi#7453, in which it is desired to have the pickled class always serialized by value, regardless of whether it is in
__main__
or a different module.cloudpickle recently added a feature that allows this sort of control: cloudpipe/cloudpickle#417. (But there are other reasons that cloudpickle doesn't work for Pulumi.)
The text was updated successfully, but these errors were encountered: