-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious premature shutdown of services #5100
Comments
I am currently not sure how to reproduce it. I have added extra logs and now I'm trying to reproduce the issue, so far master is set to debug logs, but no luck. |
Update: obsolete
dynamic-schdlr
UPDATE: from the sidecars logs in the end it seems this might also be problem because of not been able to connect to storage:
|
@matusdrobuliak66 What you have reported is not related to this issue. |
Agree, there is UPDATE in the last sentence when i realized this one was also connected to the connection to storage issue. |
UPDATE: obsolete
|
@GitHK, @matusdrobuliak66 I am closing the Fogbugz case (it is probably a duplicate of this one). So please use only this one. I don't think there's useful information in the Fogbugz case (there @matusdrobuliak66 added logs for the GC hypothesis, which doesn't seem to be the cause anymore). |
Garbage collector premature shutdown investigationdeployment: Webserver logs
TIMELINE: GC (triggers delete) -> dynamic-scheduler (Delete) -> director-v2 (Delete) -> Sidecar (Stop containers) Garbage collector logs
# THIS LOG IS FROM GARBAGE COLLECTOR (probably should not be logged here!):
log_level=INFO | log_timestamp=2024-02-13 12:54:44,883 | log_source=rpc.access:log_context(279) | log_uid=None | log_msg=Starting rPC call delete_api_keys(<Application 0x7f512a7fed70>, user_id=8, product_name=s4l, name=_auto_384de5cc-ca7e-59b2-b7b1-0371eac73d12) ...
# THIS LOG IS FROM WEBSERVER:
"2024-02-13T12:54:44.884Z","sim4lifeManager1","staging-simcore_staging_wb-garbage-collector.1.mohlo4qgp2bbub6iwmvcfc079","log_level=INFO | log_timestamp=2024-02-13 12:54:44,883 | log_source=rpc.access:log_context(279) | log_uid=None | log_msg=Starting rPC call delete_api_keys(<Application 0x7f512a7fed70>, user_id=8, product_name=s4l, name=_auto_384de5cc-ca7e-59b2-b7b1-0371eac73d12) ..."
"2024-02-13T12:54:44.887Z","sim4lifeManager1","staging-simcore_staging_wb-garbage-collector.1.mohlo4qgp2bbub6iwmvcfc079","log_level=INFO | log_timestamp=2024-02-13 12:54:44,887 | log_source=rpc.access:log_context(286) | log_uid=None | log_msg=Finished rPC call delete_api_keys(<Application 0x7f512a7fed70>, user_id=8, product_name=s4l, name=_auto_384de5cc-ca7e-59b2-b7b1-0371eac73d12) in 0.003637s", In the next garbage collect cycle dead session of Manuel was found:
Dynamic-scheduler logs
Director-v2 logs
NOTE: For some reason 2 more times delete was received from dynamic-scheduler:
Sidecar logs
Notes:
|
Closing this issue as we have already dealt with most of the known issues |
Is there an existing issue for this?
Which deploy/s?
production aws (e.g. osparc.io), staging aws (e.g. staging.osparc.io), production on-premise (dalco), stating on-premise (dalco), development (master)
Current Behavior
Some users experience their service shutting down as soon as it opens.
director-v2
nor thedynamic-sidecar
provide any hint or log message to help with debuggingExpected Behavior
Service should not automatically shut down without an issue. The issue should be at least logged at some log level before triggering a shutdown.
Steps To Reproduce
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: