-
Notifications
You must be signed in to change notification settings - Fork 28
PB-269: enable reading participants weights from S3 #254
PB-269: enable reading participants weights from S3 #254
Conversation
5590d32
to
e5cd6d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good so far, but I'll mention some thoughts now. Perhaps the naming of the new abstract classes ("aggregated weights" vs "participants weights") could be better. First of all, more generic names might be better, say "global" vs "local" weights. Secondly, Store
doesn't say whether we're writing or reading the weights. Presumably we would also have the dual situation in xain-sdk
. Hence I would suggest GlobalWeightsWriter
and LocalWeightsReader
.
another comment is just food for thought, and I haven't thought about it that much (and may not be very "pythonic" either): since the 2 interfaces are single-method, why not scrap the interfaces altogether and just pass functions around? For example, the coordinator would take args:
global_weights_writer: Callable[[int, ndarray], None]
local_weights_reader: Callable[[str, int], ndarray]
Thanks a lot for the suggestions for naming. The Reader/Writer terminology seems a much better fit indeed. Regarding your API suggestion: that's something I'd definitely go for in some other languages but I think in Python it may be a bit limiting. For instance to implement a reader and writer for a storage service in the cloud, we need to store some state: credentials, bucket name etc. There is two possibilities:
Here is a very silly illustration of how such an API would be used: from typing import Callable
# Our API:
def take_callable(f: Callable[[int, int], int]):
return f(1,2)
# Solution 1: implement a class with __callable__
def Adder():
def __init__(self):
# some dummy state for illustration purpose
self.credentials = "login:password"
def __call__(self, x, y):
print(self.credentials) # use our dummy state
return x + y
take_callable(Adder())
# Solution 2: a closure
credentials_for_addition= "login:password"
def add(x, y):
# Silly example to show how to embed information into a function.
# In real life we'd use a function factory or decorators to do that.
print(credentials_for_addition)
return x+y
take_callable(add) In the end, I feel like 99% of the users would go for the class solution. But then it would make more sense to use an abstract class. Another advantage of an abstract class is that we can add new methods later. |
Summary: Add code for reading participants weights from S3. Implementation details: We split the AbstractStore class in to separate abstract classes: - `AbstractAggregatedWeightsStore` is used to upload the aggregated weights to S3 - `AbstractParticipantsWeightsStore` is used to download participants weights from S3 The separation also allows us to use a dummy `NullObjectStore` class for retrieving the participants weights, while using the real `AggregatedWeightsS3Store` class for uploading aggregated weights. In the future, it may be useful to provide different configurations for the storage service used for aggregated weights and the storage service used for participants weights: they may use different credentials, permissions, endpoints, etc. References: https://xainag.atlassian.net/browse/PB-269
Thanks for entertaining the idea, and writing your thoughts down! I agree it makes more sense to stick with classes in this situation. Since I'm more confident this is the way to go now, will review the rest soon (when the linter stops complaining). |
f1509be
to
c84e394
Compare
@finiteprods I did the renaming you suggested. I also renamed the various attribute to be consistent with the class names. |
c84e394
to
c00ce94
Compare
@Robert-Steiner is f1509be ok for you? I think that's what we agreed on in your inital PR for storing the metrics, but there were some remaining places where we were still returning a boolean. I can pull that out in a separate PR if you prefer. |
That's totally fine for me. I'm surprised that that the CI hasn't complained about the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just minor suggestions - mostly to do with being consistent with the new naming e.g. in the config files, tests, etc.
@@ -91,8 +82,6 @@ def write_metrics(self, participant_id: str, metrics: Dict[str, ndarray]) -> boo | |||
except Exception as err: # pylint: disable=broad-except |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we be more specific about the possible exceptions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess the influx db client raises custom exceptions. cc @Robert-Steiner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right the InfluxDBClient
raises custom exceptions like InfluxDBClientError
and InfluxDBServerError
. The issue is, that the InfluxDB client doesn't catch exceptions and wrap them into a InfluxDBClientError
internally. So that means, that the call self.influx_client.write_points(influx_data_points)
will raise a HTTPError
from the urllib
package instead of raising a InfluxDBClientError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe we could catch the three of them to be sure not to miss one: InfluxDBClientError
, InfluxDBServerError
, and HTTPError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTPError was just an example, the problem is that it isn't documented which exception could be raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok. That's not pretty... I guess we could catch everything but KeyboardInterrupt
then. Would you be happier with that @finiteprods ? 😄
Co-Authored-By: kwok <[email protected]>
For the case we only start minio via docker ( |
Summary:
Add code for reading participants weights from S3.
Implementation details:
We split the AbstractStore class in to separate abstract classes:
AbstractAggregatedWeightsStore
is used to upload the aggregatedweights to S3
AbstractParticipantsWeightsStore
is used to download participantsweights from S3
The separation also allows us to use a dummy
NullObjectStore
classfor retrieving the participants weights, while using the real
AggregatedWeightsS3Store
class for uploading aggregated weights.In the future, it may be useful to provide different configurations
for the storage service used for aggregated weights and the storage
service used for participants weights: they may use different
credentials, permissions, endpoints, etc.
References:
https://xainag.atlassian.net/browse/PB-269
Reviewer checklist
Reviewer agreement:
Merge request checklist
XP-XXX <a description in imperative form>
.XP-XXX <a description in imperative form>
.Code checklist
XP-XXX-<a_small_stub>
.