-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hwp-pid direction read bugfix #728
Conversation
Yeah, I was not sure if 'R2400000' means miscellaneous or it actually means direction but zero is missing. |
Actually, I got IndexError from pid agent probably for the first time. We had another error which seems to be related with the drop of response string (reported in the same discussion). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How sure are we that there is actually a missing 0 in the return string, and not that this is some other error?
Instead of assuming this is the case, I'd much rather add proper error handling to the string parsing such that unexpected responses do no crash the main process. I think string parsing like this must be wrapped in a try/catch, where if there is an error, we log the unexpected response and return an "error" response so that the main process can handle it properly.
I've added a decorator to the read functions which will cause those functions to try multiple times before failing. This should hopefully fix some of the crashes this agent has been getting. I've also reverted some of the response handling changes that were previously implemented. The manner in which the read statements fail doesn't seem to be as consistent as I had believed, so those changes did not cover all failure cases. The most recent additions have not been tested yet, but I've pushed what they are currently in case anyone has comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bryce, thanks for the changes! Looks good to me for the most part but there are a couple of important points in inline comments that I think need to be addressed.
except: | ||
time.sleep(0.2) | ||
print(f'Could not complete {func.__name__} after {loops} attempt(s)') | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this return None
changes the failure behavior of the functions. Before, on failure they would raise an error.
I'm afraid this will make it look as though the agent get_state
process is successful, while the values are actually None
. I'd either change this so that it raises an error, or modify the agent to properly handle None return vals.
if decoded_resp.msg_type == 'measure': | ||
return decoded_resp.measure | ||
elif decoded_resp.msg_type == 'error': | ||
raise ValueError(f"Error reading freq: {decoded_resp.msg}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be swallowed by the try/catch
in the retry_multiple_times
decorator, so it won't be logged to stdout. Make sure to log an error here with the problematic response.
I've added handling of the |
I'm sorry I don't think I agree with this method of handling the readout errors. I think that if any of the parameters fail to read-out, we should throw an error in Agents calling this function expect to get a complete response, and if that is not possible then they need decide how to handle that case. Removing fields like you do here kind of just hides the problem, and moves the responsibility of checking whether fields exist to the caller, which they're not currently doing. We'll also need to add error handling to the functions in this agent that call
|
Okay, I've changed the handling of readout errors to follow Jack's suggestions. Read errors in the |
Ok cool, I think this is starting to look better. One quick thing and then I think this is good to merge... to set the process to be degraded, we want to run:
and then we want to set this back to This will make it so the operation state will appear as "degraded" in grafana, and we can set alarms based on that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bbixler500, looks like there is a test failing, but as soon as that's fixed and this is tested I think we can merge this.
Okay, all tests are passing now |
Great, has the most recent version been tested against hardware? |
Yeah, every step of the way I was testing on satp2 before pushing my updates |
Initial attempt to fix occasional issues in reading the hwp-pid direction
Description
Adds additional logic to handle read strings that are formatted in a non standard way.
Motivation and Context
Recently changes were pushed to the hwp-pid agent #686 which changed how response messages were processed. As an unintended consequence, a new bug surfaced when reading the rotation direction #727. From these error logs it appears as though the response message is mostly correct, but it is missing an initial 0 digit. This PR is a simple fix which will remedy this specific issue. Feedback is appreciated if the issue is more complicated than described above.
How Has This Been Tested?
This fix has not been tested and needs to be before it is merged into main
Types of changes
Checklist: