Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Simplify actor request handlers #2600

Merged

Conversation

didier-wenzek
Copy link
Contributor

@didier-wenzek didier-wenzek commented Jan 18, 2024

Proposed changes

In this PR, @Bravo555 rightly argued that using async/await would be a more maintainable way to handle concurrent download and upload tasks in the c8y mapper, compared to the current design where an actor has to explicit maintain the intermediate states of these concurrent tasks (i.e. sending a request to a server actor, persisting in the actor state the fact that a response is expected from that actor along all the data required to process the response, proceeding with other messages, finally receiving and processing the response with the restored state).

This PR proposes a way to better combine async/await tasks and actors.

  • The first two commits, e3034a6 and 83ee198, are preparation steps. The aim is to have tedge_actors::Sender which are not necessarily Clone.
  • A subsequent commit, 5c86395, cleans up the tedge_actors traits. This is currently the 4th commit, before merging I will have to move this commit in 3rd position.
  • The next step, 6b9e5d2e, is to wrap a oneshot::Sender into a tedge_actors::Sender,
    which can be then used in RequestEnvelope<Request, Response>: messages wrapping a request with a tedge_actors::Sender to be used for the response.
  • The key step, 40538f5, is to deprecate the KeyedMessageBoxes and to replace them with boxes expecting RequestEnvelopes. A server has no more a fixed number of pre-registered clients nor a specific sender for its responses. Instead, the clients send a reply_to response sender along their requests.
  • The following commits dff9778 and 01ae9d6 (these will be squashed with the previous one on merge), applies the change to all the actors. The behavior is exactly the same has before. Only the way server responses are sent back has been changed under the hood.
  • One test is still failing and will have to be fixed: the timer actor behaves as expected on shutdown but the test fails to notice that the actor has been dropped as expected. Fixed by 5126f58

The POC is applied to the c8y_firmware_manager. The point is to remove the need for reqs_pending_download: HashMap<String, SmartRestFirmwareRequest> where are stored the data to resume the request handling once the firmware has been successfully downloaded. We want to replace this explicit state management with regular async/await code.

  • A first commit, aaff523, remove the cache but, doing so, introduce an issue.
    As the firmware actor now awaits for the firmware to be fully downloaded before sending a message to the child device, the firmware actor ignores any other requests till the firmware is fully downloaded. This issue is captured by the handle_child_response_while_busy_downloading test, which is now failing. The goal of the following steps is to make this test pass again.
  • With ac9b7f7, the firmware actor struct is changed and splitted in two parts. The actor itself keeps only stateful things that cannot be shared (the message receiver and the cache of pending operations). All the channel senders are grouped in a struct Worker which is Clone and can be
    spawned in background tasks.
  • After the previous stage, the FirmwareManagerWorker has no method and only holds data used by the FirmwareManagerActor. The following commit 1cd4f03 fixes that: are moved to the worker all the methods that doesn't access the actor state.
  • Finally, with 2f15275, the download task is launched in the background, fixing the expected behavior on concurrent requests and the handle_child_response_while_busy_downloading test.
  • In order to minimize the diff size the implementation of the FirmwareManagerWorker has been interleaved with the FirmwareManagerActor implementation. This will have to be fixed before merge.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue


Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link
Contributor

github-actions bot commented Jan 23, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
389 0 3 389 100 54m34.259999999s

@didier-wenzek didier-wenzek force-pushed the refactor/actor-request-handler branch from bf0eeeb to b1d30f9 Compare February 15, 2024 09:42
@didier-wenzek didier-wenzek force-pushed the refactor/actor-request-handler branch from 0b8c25e to dff9778 Compare February 19, 2024 17:27
@didier-wenzek didier-wenzek marked this pull request as ready for review February 21, 2024 15:41
@@ -154,3 +155,14 @@ impl<Request: Message, Response: Message> ServiceProvider<Request, Response, NoC
})
}
}

/// A connector to a [Server] expecting Request and returning Response.
pub trait Service<Request: Message, Response: Message>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait was added just for aesthetics, to look better when used in actor builders along with ServiceProviders, right?

Copy link
Contributor Author

@didier-wenzek didier-wenzek Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by aesthetics you mean a nicer type name, then yes. Also, the clients of a server actor doesn't need to know that their requests are wrapped into envelopes with a reply-to address.

Once this PR merged, we can go further. Indeed, the current ServiceProvider / ServiceConsumer abstraction can now be deprecated in favor of MessageSource / MessageSink.

ReplyToRequester,
>,
jwt: &mut impl ServiceProvider<RequestEnvelope<(), JwtResult>, NoMessage, ReplyToRequester>,
http: &mut impl Service<HttpRequest, HttpResult>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing that makes me appreciate that dummy Service abstraction a lot more, hiding all the unnecessary boilerplate like NoMessage, NoConfig etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the point of this abstraction. Not so dummy ;-).

mut self,
operation_id: String,
smartrest_request: SmartRestFirmwareRequest,
mut input_receiver: mpsc::Receiver<FirmwareOperationResponse>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mut input_receiver: mpsc::Receiver<FirmwareOperationResponse>,
mut response_receiver: mpsc::Receiver<FirmwareOperationResponse>,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that in a final commit for this PR.

Copy link
Contributor Author

@didier-wenzek didier-wenzek Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a final commit: 0bd66fa

Ok(())
}
}

pub struct FirmwareManagerMessageBox {
input_receiver: LoggingReceiver<FirmwareInput>,
struct FirmwareManagerWorker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct FirmwareManagerWorker {
/// A worker, spawned per request, that can handle a single firmware update request for a child device end-to-end
struct FirmwareManagerWorker {

Just trying to highlight that the scope of this worker is per request.

Copy link
Contributor Author

@didier-wenzek didier-wenzek Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a final commit: 0bd66fa

Copy link
Member

@rina23q rina23q left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-reviewed only c8y_firmware_manager. I have some trivial comments, so I would approve it.

// 2. Operation timeouts from the TimerActor for requests for which the child devices don't respond within the timeout window
// 3. Download results from the DownloaderActor for firmware download requests

// 3. RequestOutcome sent back by the background workers once the firmware request has been fully processed or failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial ;)

Suggested change
// 3. RequestOutcome sent back by the background workers once the firmware request has been fully processed or failed
// 2. RequestOutcome sent back by the background workers once the firmware request has been fully processed or failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: e37bc2e

}
FirmwareInput::IdDownloadResult((id, result)) => {
self.process_downloaded_firmware(&id, result).await?
FirmwareInput::OperationOutcome(outcome) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, timeout moved to the worker!

},

#[error("Child device {child_id} did not respond within the timeout interval of {time_limit_sec}sec. Operation ID={operation_id}")]
FromTimeout {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(trivial) I think we use the From prefix for the errors inheriting from somewhere. Hence, here you don't need From.

Suggested change
FromTimeout {
ExceedTimelimit {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: e37bc2e

@@ -272,6 +251,7 @@ async fn handle_request_child_device_with_failed_download() -> Result<(), DynErr
// Assert EXECUTING SmartREST MQTT message and FAILED SmartREST MQTT message due to missing 'cache' directory.
mqtt_message_box
.assert_received([
// FIXME: executing status is no more sent when download fails
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You fixed it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this has been fixed by 2f10f93

@didier-wenzek didier-wenzek force-pushed the refactor/actor-request-handler branch from e37bc2e to ece233d Compare March 7, 2024 13:26
This method was used only in 3 tests, while adding extra complexity.

Signed-off-by: Didier Wenzek <[email protected]>
- The `Clone` trait is no more implemented by `DynSender`
  This was the main reason preventing the implementation
  of useful `Into` convertions between misc `DynSender`.
- The `adapt()` function as been deprecated
  as this was a workaround the lack of `Into` convertions between misc `DynSender`.
- The `sender_clone()` method is to be used both to `clone` and adapt a
  sender.

Signed-off-by: Didier Wenzek <[email protected]>
Instead of using an array of clients identified by their indexes,
the client address is attached to the requests.

Signed-off-by: Didier Wenzek <[email protected]>
This is an intermediate step. The next one will be
to spawn a task to handle the download and
subsequent steps so a the manager is not blocked during that time.
The failing test (handle_child_response_while_busy_downloading) is a
witness of this missing requirement.

Signed-off-by: Didier Wenzek <[email protected]>
The idea is to have a worker which is Clone with methods that can be
spawned in background tasks. The actor itself keeps only stateful things that
cannot be shared (the message receiver and the cache of pending
operations).

This is an intermediate step. All the methods are for now still attached
to the actor. The next step will be to move out from the actor to the
worker all the methods that doesn't touch the actor state.

Signed-off-by: Didier Wenzek <[email protected]>
Execute the whole firmware operation in a single async method.
 i.e. with no state cached in the actor.

Signed-off-by: Didier Wenzek <[email protected]>
@didier-wenzek didier-wenzek force-pushed the refactor/actor-request-handler branch from ece233d to 917e3e1 Compare March 7, 2024 13:32
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request March 7, 2024 13:32 — with GitHub Actions Inactive
@didier-wenzek
Copy link
Contributor Author

Also noticed that the first commit message is wrong. That commit deprecated close_sender and not sender_clone as indicated by the commit message. sender_clone was deprecated in the second commit.

The commit message has been fixed.

@didier-wenzek didier-wenzek force-pushed the refactor/actor-request-handler branch from 21864b4 to 0bd66fa Compare March 7, 2024 14:20
@didier-wenzek didier-wenzek temporarily deployed to Test Pull Request March 7, 2024 14:21 — with GitHub Actions Inactive
@didier-wenzek didier-wenzek changed the title Refactor the actor request handlers refactor: Simplify actor request handlers Mar 7, 2024
@didier-wenzek didier-wenzek added this pull request to the merge queue Mar 7, 2024
Merged via the queue into thin-edge:main with commit b2f0d0d Mar 7, 2024
26 checks passed
@didier-wenzek didier-wenzek deleted the refactor/actor-request-handler branch March 7, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants