Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(client.command.cryo_card): Catch EPICS timeout on read and retry. #794

Merged
merged 2 commits into from
Oct 28, 2024

Conversation

tristpinsm
Copy link
Collaborator

Description

Addresses issue #792

Jira Issue

N/A

Tests done on this branch

It's a simple enough fix that that testing didn't seem necessary.

Function interfaces that changed

N/A

What was the interface before the change

N/A

What will be the new interface after the change

N/A

@tristpinsm tristpinsm requested review from jlashner and swh76 May 17, 2024 17:28
@github-actions github-actions bot added the client Changes to the client code label May 17, 2024
@jlashner
Copy link
Collaborator

I think we may want to test this... PVs in the smurf_command module use a 10 second timeout, so a retry happens roughly every 10 sec. We may want to implement the same thing here; as things are now I believe retries will happen immediately which I don't think we want.

@tristpinsm tristpinsm force-pushed the tpm/cryo-card-retry branch from d375289 to c5d50c5 Compare May 17, 2024 19:01
@tristpinsm
Copy link
Collaborator Author

OK, I've added a timeout between restarts and I can try it out on a test system at SLAC this afternoon. Just to confirm that it is retrying as expected when it can't reach the cryo card. However I'm not sure how to determine the appropriate number of retries or time in between. Currently it's set to retry 5 times, at 5s intervals.

@tristpinsm
Copy link
Collaborator Author

Alright so I've tested CryoCard.read_temperature with no cryo card connected and it's retrying as it should:

[ 2024-05-17 19:25:19 ]  CryoCard.do_read waiting 5s before retry 2 / 5.
[ 2024-05-17 19:25:24 ]  CryoCard.do_read waiting 5s before retry 3 / 5.
[ 2024-05-17 19:25:29 ]  CryoCard.do_read waiting 5s before retry 4 / 5.
[ 2024-05-17 19:25:34 ]  CryoCard.do_read waiting 5s before retry 5 / 5.

However, after those attempts it returns a nonsense value, which is just how the code is designed.

@jlashner
Copy link
Collaborator

I'm confused how this is working, shouldn't the continue bypass the sleep and log? Maybe this is an instance where it does not timeout but the returned address does not match.

I think instead of manually sleeping, it makes more sense to set the timeout in the epics get fn, like here:

ret = self._pv_cache[pvname].get(count=count,
use_monitor=use_monitor,
timeout=timeout,
**kwargs)

5 sec timeout with 5 retries sounds good to me as a default!

@tristpinsm
Copy link
Collaborator Author

I'm confused how this is working, shouldn't the continue bypass the sleep and log? Maybe this is an instance where it does not timeout but the returned address does not match.

yeah, that's a mistake...

I think instead of manually sleeping, it makes more sense to set the timeout in the epics get fn, like here:

ret = self._pv_cache[pvname].get(count=count,
use_monitor=use_monitor,
timeout=timeout,
**kwargs)

5 sec timeout with 5 retries sounds good to me as a default!

ok

@tristpinsm tristpinsm force-pushed the tpm/cryo-card-retry branch from c5d50c5 to dde56c5 Compare May 17, 2024 21:49
Also adds a timeout between retries and logging capability.
@tristpinsm tristpinsm force-pushed the tpm/cryo-card-retry branch from dde56c5 to 2697cfd Compare May 17, 2024 22:25
@jlashner
Copy link
Collaborator

This is looking good to me... one more thing though, I think we should raise an exception on failure instead of just returning 0. Can we add that here?

@tristpinsm
Copy link
Collaborator Author

This is looking good to me... one more thing though, I think we should raise an exception on failure instead of just returning 0. Can we add that here?

Yeah, I agree that makes a lot more sense. I wonder if that change may break other code that is relying on the current behaviour though. I guess I can have look and see.

Copy link
Collaborator

@jlashner jlashner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@tristpinsm
Copy link
Collaborator Author

@swh76 this one should also be merged for the next release

@swh76 swh76 merged commit 8b4a5ec into main Oct 28, 2024
17 checks passed
@swh76 swh76 deleted the tpm/cryo-card-retry branch October 28, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client Changes to the client code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants