-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thread safe guard to throw exception when share DBConnection cross threads #635
base: master
Are you sure you want to change the base?
Conversation
@@ -33,6 +33,7 @@ inline void guard(FUNC func, const char* command) | |||
|
|||
RedisReply::RedisReply(RedisContext *ctx, const RedisCommand& command) | |||
{ | |||
ThreadSafeGuard safeguard(ctx->getRunningFlag(), string(command.c_str())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use std::mutex ? and mutex guard ? instead of implementing yourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I know the libswsscommon is not thread safe by design, so we add this class to catch the multi-thread scenario and throw exception.
If we add lock here, it will impact the critical performance, like route sync or warm reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think std::mutext will impact performace here, mutext is really quick, implementation internal is on atomic exchange, its 1 assembly instruction, so this is not performance critical here, and using already given tools make it simpler and better understanding for someone that is new to the code
if you are throwing here then how this will be handled in production scenario? just crash ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with QI, I have better undertsand about whu not use mutex, mutext need context switch which take microseconds, and there a some critical scenario can't accept this delay for example update thousands of routes.
Here is my closed PR for add a mutex, which contains more information:
#634
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that depends on the mutex, you could have potentially spin lock since this operation could be very fast, and you dont need to use recursive_mutex as in showed PR, since you will not have recursive call here, still for me using std::mutex here is better than implementing it from scratch, also why crash on multiple thread access? and not wait ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ThreadSafeGuard is not a lock, it's just check race condition and throw exception when it happens.
Also, the code inside RedisContext ctor sometimes very slowly, for example run a LUA script.
Here is an example how crash happen:
caclmgrd currently share same RedisContext with multiple thread:
Thread 1: run 'keys' command
Thread 2: run 'hgetall' command
Then when thread 1 read command result, it read the 'keys' result, however the thread 1 process the result as 'hgetall' result, so the code crash.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@liuh-80 Can this PR fix the issue sonic-net/sonic-buildimage#10883? |
Why I did it
For performance erason, swsscommon is not thread safe.
Code using swsscommon may make mistake to share db connection cross thread, and it's difficult to debug.
To slove this issue, add ThreadSafeGuard class to detect and throw exception when db connection shared by multiple thread.
How I did it
Add ThreadSafeCheck class to check and throw exception when DBConnect shared by multiple thread.
How to verify it
Pass all test case.
Add new test case to cover new code.
Which release branch to backport (provide reason below if selected)
Description for the changelog
Add ThreadSafeCheck class to check and throw exception when DBConnect shared by multiple thread.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)