-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if port VID exists in db on flex counter query #464
Conversation
sai_serialize_object_id(vid).c_str()); | ||
|
||
op = DEL_COMMAND; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will there be another case, where flex counter request come first and by that time, the port vid has not been added to the mapping table, which end up with the case, the flex counter is not added at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is possible, but no harm here, since FlexCounter::removeX() functios checks internally wheter vid is in the map, and only if then remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original question remains unanswered: "which end up with the case, the flex counter is not added at all
?"
Or we treat it as a compromise? Maybe it is time to unify flexcounter DB channel with asicDB channel to avoid the race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into an issue that caused Orchagent to crash as follows:
- swss added an ACL rule, and an associated new flex counter was passed to syncd to be added.
- Flex counter somehow moved faster than the ACL rule, so sycnd decided that instead of adding this flex counter without finding the ACL rule sitting in the database, syncd did nothing, which is the intended behavior of this code change.
- Orchagent finished adding this ACL rule and thought that everything is fine.
- Later on orchagent tried to look up the flex counter value but syncd reported that there is nothing there.
- Orchagent then did abort().
There might still be issue here: but rif operation goes through asicDB channel, flex counter op uses flexcounterDB, no receiving order at syncd is guaranteed. <2> rif vid/rid mapping is removed in syncd with rif delete request, flex counter removal to be handled if the operation request comes later from flexcounterDB channel. <3> flexCounterThread as a separate thread may keep collecting counter data for that RIF though it has been deleted. Did I miss anything here? |
i think it will be no collecting counters, since if rif ws removed, queued counters are also cleared, look here: |
I see. Having some stale counters in the counterIdsMap temporarily probably is not fatal with this change, but just results in some error log or stale VID counters inside counterDB (due to the asynchronous processing in flexCounterThread). |
not following, what stale counters? they will be removed if vid is not found |
* Access DB directly to get the PSU status information * Add test case for psu CLI Signed-off-by: Kevin Wang <[email protected]>
No description provided.