Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if port VID exists in db on flex counter query #464

Merged
merged 1 commit into from
Jun 4, 2019

Conversation

kcudnik
Copy link
Collaborator

@kcudnik kcudnik commented Jun 3, 2019

No description provided.

sai_serialize_object_id(vid).c_str());

op = DEL_COMMAND;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there be another case, where flex counter request come first and by that time, the port vid has not been added to the mapping table, which end up with the case, the flex counter is not added at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is possible, but no harm here, since FlexCounter::removeX() functios checks internally wheter vid is in the map, and only if then remove it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original question remains unanswered: "which end up with the case, the flex counter is not added at all?"

Or we treat it as a compromise? Maybe it is time to unify flexcounter DB channel with asicDB channel to avoid the race condition?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into an issue that caused Orchagent to crash as follows:

  1. swss added an ACL rule, and an associated new flex counter was passed to syncd to be added.
  2. Flex counter somehow moved faster than the ACL rule, so sycnd decided that instead of adding this flex counter without finding the ACL rule sitting in the database, syncd did nothing, which is the intended behavior of this code change.
  3. Orchagent finished adding this ACL rule and thought that everything is fine.
  4. Later on orchagent tried to look up the flex counter value but syncd reported that there is nothing there.
  5. Orchagent then did abort().

@lguohan lguohan merged commit 86e6caf into sonic-net:master Jun 4, 2019
@kcudnik kcudnik deleted the flexfix branch June 4, 2019 15:28
@jipanyang
Copy link
Contributor

There might still be issue here:
<1> at intforch, rif flex counter is removed first, then rif:
https://github.com/Azure/sonic-swss/blob/master/orchagent/intfsorch.cpp#L563
https://github.com/Azure/sonic-swss/blob/master/orchagent/intfsorch.cpp#L565

but rif operation goes through asicDB channel, flex counter op uses flexcounterDB, no receiving order at syncd is guaranteed.

<2> rif vid/rid mapping is removed in syncd with rif delete request, flex counter removal to be handled if the operation request comes later from flexcounterDB channel.

<3> flexCounterThread as a separate thread may keep collecting counter data for that RIF though it has been deleted.

Did I miss anything here?

@kcudnik
Copy link
Collaborator Author

kcudnik commented Jun 13, 2019

i think it will be no collecting counters, since if rif ws removed, queued counters are also cleared, look here:
https://github.com/Azure/sonic-sairedis/pull/464/files#diff-91b3e380b7f6f4a0a048ca8296a12e0dR2908
i dont know exact internals of collecting counters but this seems ok to me

@jipanyang
Copy link
Contributor

I see. Having some stale counters in the counterIdsMap temporarily probably is not fatal with this change, but just results in some error log or stale VID counters inside counterDB (due to the asynchronous processing in flexCounterThread).

@kcudnik
Copy link
Collaborator Author

kcudnik commented Jun 14, 2019

not following, what stale counters? they will be removed if vid is not found

pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 18, 2022
jianyuewu pushed a commit to jianyuewu/sonic-sairedis that referenced this pull request Feb 7, 2025
* Access DB directly to get the PSU status information
* Add test case for psu CLI

Signed-off-by: Kevin Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants