Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

PB-269: enable reading participants weights from S3 #254

Conversation

little-dude
Copy link
Contributor

@little-dude little-dude commented Feb 3, 2020

Summary:

Add code for reading participants weights from S3.

Implementation details:

We split the AbstractStore class in to separate abstract classes:

  • AbstractAggregatedWeightsStore is used to upload the aggregated
    weights to S3
  • AbstractParticipantsWeightsStore is used to download participants
    weights from S3

The separation also allows us to use a dummy NullObjectStore class
for retrieving the participants weights, while using the real
AggregatedWeightsS3Store class for uploading aggregated weights.

In the future, it may be useful to provide different configurations
for the storage service used for aggregated weights and the storage
service used for participants weights: they may use different
credentials, permissions, endpoints, etc.

References:

https://xainag.atlassian.net/browse/PB-269


Reviewer checklist

Reviewer agreement:

  • Reviewers assign themselves at the start of the review.
  • Reviewers do not commit or merge the merge request.
  • Reviewers have to check and mark items in the checklist.

Merge request checklist

  • Conforms to the merge request title naming XP-XXX <a description in imperative form>.
  • Each commit conforms to the naming convention XP-XXX <a description in imperative form>.
  • Linked the ticket in the merge request title or the references section.
  • Added an informative merge request summary.

Code checklist

  • Conforms to the branch naming XP-XXX-<a_small_stub>.
  • Passed scope checks.
  • Added or updated tests if needed.
  • Added or updated code documentation if needed.
  • Conforms to Google docstring style.
  • Conforms to XAIN structlog style.

janpetschexain
janpetschexain previously approved these changes Feb 4, 2020
@little-dude little-dude force-pushed the PB-269-read-participants-weights branch 4 times, most recently from 5590d32 to e5cd6d5 Compare February 4, 2020 12:03
Copy link
Contributor

@finiteprods finiteprods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good so far, but I'll mention some thoughts now. Perhaps the naming of the new abstract classes ("aggregated weights" vs "participants weights") could be better. First of all, more generic names might be better, say "global" vs "local" weights. Secondly, Store doesn't say whether we're writing or reading the weights. Presumably we would also have the dual situation in xain-sdk. Hence I would suggest GlobalWeightsWriter and LocalWeightsReader.

another comment is just food for thought, and I haven't thought about it that much (and may not be very "pythonic" either): since the 2 interfaces are single-method, why not scrap the interfaces altogether and just pass functions around? For example, the coordinator would take args:

global_weights_writer: Callable[[int, ndarray], None]
local_weights_reader: Callable[[str, int], ndarray]

@little-dude
Copy link
Contributor Author

little-dude commented Feb 4, 2020

Thanks a lot for the suggestions for naming. The Reader/Writer terminology seems a much better fit indeed.

Regarding your API suggestion: that's something I'd definitely go for in some other languages but I think in Python it may be a bit limiting.

For instance to implement a reader and writer for a storage service in the cloud, we need to store some state: credentials, bucket name etc. There is two possibilities:

  • In Python, it is natural to store that information in an object, so most users will implement the reader and writer as classes. But because the API takes a callable, they would need to implement the __call__ method on their classes, which doesn't feel too nice.
  • Another solution for storing their credentials in a Callable would be to use closures, but python closures are notoriously awkward.

Here is a very silly illustration of how such an API would be used:

from typing import Callable

# Our API:

def take_callable(f: Callable[[int, int], int]):
    return f(1,2)




# Solution 1: implement a class with __callable__

def Adder():
    def __init__(self):
        # some dummy state for illustration purpose
        self.credentials = "login:password"

    def __call__(self, x, y):
        print(self.credentials)  # use our dummy state
        return x + y

take_callable(Adder())


# Solution 2: a closure

credentials_for_addition= "login:password"
def add(x, y):
    # Silly example to show how to embed information into a function.
    # In real life we'd use a function factory or decorators to do that.
    print(credentials_for_addition)
    return x+y

take_callable(add)

In the end, I feel like 99% of the users would go for the class solution. But then it would make more sense to use an abstract class. Another advantage of an abstract class is that we can add new methods later.

Summary:

Add code for reading participants weights from S3.

Implementation details:

We split the AbstractStore class in to separate abstract classes:

- `AbstractAggregatedWeightsStore` is used to upload the aggregated
  weights to S3
- `AbstractParticipantsWeightsStore` is used to download participants
  weights from S3

The separation also allows us to use a dummy `NullObjectStore` class
for retrieving the participants weights, while using the real
`AggregatedWeightsS3Store` class for uploading aggregated weights.

In the future, it may be useful to provide different configurations
for the storage service used for aggregated weights and the storage
service used for participants weights: they may use different
credentials, permissions, endpoints, etc.

References:

https://xainag.atlassian.net/browse/PB-269
@finiteprods
Copy link
Contributor

Thanks for entertaining the idea, and writing your thoughts down! I agree it makes more sense to stick with classes in this situation. Since I'm more confident this is the way to go now, will review the rest soon (when the linter stops complaining).

@little-dude little-dude force-pushed the PB-269-read-participants-weights branch 2 times, most recently from f1509be to c84e394 Compare February 5, 2020 09:21
@little-dude
Copy link
Contributor Author

@finiteprods I did the renaming you suggested. I also renamed the various attribute to be consistent with the class names.

@little-dude little-dude force-pushed the PB-269-read-participants-weights branch from c84e394 to c00ce94 Compare February 5, 2020 09:24
@little-dude
Copy link
Contributor Author

@Robert-Steiner is f1509be ok for you? I think that's what we agreed on in your inital PR for storing the metrics, but there were some remaining places where we were still returning a boolean. I can pull that out in a separate PR if you prefer.

@Robert-Steiner
Copy link
Contributor

@Robert-Steiner is f1509be ok for you? I think that's what we agreed on in your inital PR for storing the metrics, but there were some remaining places where we were still returning a boolean. I can pull that out in a separate PR if you prefer.

That's totally fine for me. I'm surprised that that the CI hasn't complained about the test.

Copy link
Contributor

@finiteprods finiteprods left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just minor suggestions - mostly to do with being consistent with the new naming e.g. in the config files, tests, etc.

@@ -91,8 +82,6 @@ def write_metrics(self, participant_id: str, metrics: Dict[str, ndarray]) -> boo
except Exception as err: # pylint: disable=broad-except
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we be more specific about the possible exceptions here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess the influx db client raises custom exceptions. cc @Robert-Steiner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right the InfluxDBClient raises custom exceptions like InfluxDBClientError and InfluxDBServerError. The issue is, that the InfluxDB client doesn't catch exceptions and wrap them into a InfluxDBClientError internally. So that means, that the call self.influx_client.write_points(influx_data_points) will raise a HTTPError from the urllib package instead of raising a InfluxDBClientError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe we could catch the three of them to be sure not to miss one: InfluxDBClientError, InfluxDBServerError, and HTTPError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTPError was just an example, the problem is that it isn't documented which exception could be raised.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. That's not pretty... I guess we could catch everything but KeyboardInterrupt then. Would you be happier with that @finiteprods ? 😄

@Robert-Steiner
Copy link
Contributor

For the case we only start minio via docker (docker-compose -f docker-compose-dev.yml run -d initial-buckets) and start the coordinator via the cli (coordinator --config configs/xain-fl.toml) we need to adjust the storage endpoint in configs/xain-fl.toml to endpoint = "http://localhost:9000"

@little-dude little-dude merged commit d1d97b8 into xaynetwork:development Feb 5, 2020
@little-dude little-dude deleted the PB-269-read-participants-weights branch February 5, 2020 20:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants