Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Polished and modernised agent service ⚠️ #6452

Merged

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Sep 26, 2024

Dev-ops ⚠️

  • ensure agent runs on all nodes where new style dynamic services run, including autoscaled nodes

What do these changes do?

This service is way out of date and did not receive attention in years (2+). It currently causes a lot of pain for the platform and with the latest tools we've added it can be rewritten in a more sane and efficient manner.

This rewrite is also keeping an eye out for cost reduction, tries to avoid useless data uploads (unlike the previous version).

Highlights:

  • agent nodes are targeted via RPCNamespace which is composed out of docker_node_id + swarm_stack_name
  • only one volume at a time can be removed per node (avoids race conditions as they were present in the previous version)
  • agent still maintains the task of backing up data that was left on the node by sidecars non terminated by director-v2
  • agent exposes Prometheus metrics regarding the number of volumes removed and the number of volumes that it backed up to S3
  • dropped dy-volrm docker service which was created for each service for the purpose of removing it's volumes

Followup issues:

Related issue/s

How to test

Dev-ops checklist ⚠️

@GitHK GitHK self-assigned this Sep 26, 2024
@GitHK GitHK added a:agent agent service t:maintenance Some planned maintenance work bug buggy, it does not work as expected labels Sep 26, 2024
Copy link

codecov bot commented Sep 26, 2024

Codecov Report

Attention: Patch coverage is 91.22807% with 35 lines in your changes missing coverage. Please review.

Project coverage is 88.1%. Comparing base (cafbf96) to head (20e2f8e).
Report is 617 commits behind head on master.

Files with missing lines Patch % Lines
...ervicelib/rabbitmq/rpc_interfaces/agent/volumes.py 0.0% 18 Missing ⚠️
...servicelib/rabbitmq/rpc_interfaces/agent/errors.py 0.0% 5 Missing ⚠️
...agent/src/simcore_service_agent/services/backup.py 89.6% 3 Missing ⚠️
...rc/simcore_service_agent/api/rest/_dependencies.py 83.3% 2 Missing ⚠️
.../simcore_service_agent/services/volumes_manager.py 97.8% 2 Missing ⚠️
...s/dynamic_sidecar/scheduler/_core/_events_utils.py 77.7% 2 Missing ⚠️
...gent/src/simcore_service_agent/api/rest/_health.py 92.8% 1 Missing ⚠️
...src/simcore_service_agent/services/docker_utils.py 98.3% 1 Missing ⚠️
...e_service_agent/services/instrumentation/_setup.py 93.7% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #6452      +/-   ##
=========================================
+ Coverage    84.5%   88.1%    +3.5%     
=========================================
  Files          10    1540    +1530     
  Lines         214   63086   +62872     
  Branches       25    2058    +2033     
=========================================
+ Hits          181   55600   +55419     
- Misses         23    7169    +7146     
- Partials       10     317     +307     
Flag Coverage Δ
integrationtests 64.7% <86.6%> (?)
unittests 86.1% <90.2%> (+1.5%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...y/src/models_library/api_schemas__common/health.py 100.0% <ø> (ø)
.../models_library/api_schemas_directorv2/services.py 92.8% <100.0%> (ø)
...ervice-library/src/servicelib/fastapi/app_state.py 100.0% <100.0%> (ø)
...agent/src/simcore_service_agent/api/rest/routes.py 100.0% <100.0%> (ø)
...gent/src/simcore_service_agent/api/rpc/_volumes.py 100.0% <100.0%> (ø)
.../agent/src/simcore_service_agent/api/rpc/routes.py 100.0% <100.0%> (ø)
...gent/src/simcore_service_agent/core/application.py 100.0% <100.0%> (ø)
...s/agent/src/simcore_service_agent/core/settings.py 100.0% <100.0%> (ø)
.../agent/src/simcore_service_agent/models/volumes.py 100.0% <100.0%> (ø)
...service_agent/services/instrumentation/__init__.py 100.0% <100.0%> (ø)
... and 19 more

... and 1461 files with indirect coverage changes

@GitHK GitHK added this to the MartinKippenberger milestone Sep 26, 2024
@GitHK GitHK changed the title ♻️ Rework anget service ♻️ Fix anget service Sep 26, 2024
Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side 👍

@GitHK GitHK requested a review from pcrespov October 3, 2024 13:49
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@pcrespov pcrespov requested a review from YuryHrytsuk October 3, 2024 15:02
@GitHK GitHK enabled auto-merge (squash) October 4, 2024 07:37
Andrei Neagu added 4 commits October 4, 2024 10:13
Copy link

sonarqubecloud bot commented Oct 4, 2024

@GitHK GitHK merged commit 4d0fa91 into ITISFoundation:master Oct 4, 2024
57 checks passed
@GitHK GitHK deleted the pr-osparc-rework-agent-to-handle-volumes branch October 4, 2024 09:15
@GitHK
Copy link
Contributor Author

GitHK commented Oct 4, 2024

Volume removal is working as expected, for autoscaled nodes and classic nodes.
director-v2 is able to remove volumes from autoscaled nodes, since the agent service starts as soon as the node is active.

mrnicegyu11 pushed a commit to mrnicegyu11/osparc-simcore that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:agent agent service bug buggy, it does not work as expected t:maintenance Some planned maintenance work
Projects
None yet
4 participants