-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
createCluster clients don't handle on('error') correctly #2721
Comments
I'm also observing this same behavior. |
I am also seeing this same behavior. It seems to happen for us whenever we perform something that will require a rolling update to redis (kubernetes) -- for example a version upgrade or a config update. The application will end up spewing errors, and never reconnect, even when the cluster is healthy for a long time. The only solution I could come up with is (like OP) to restart the process (in my case, failing the health check so k8 restarts the process). This is certainly a non-ideal solution. The general steps to repro:
|
I can also confirm this to be an issue. Our setup is almost identical to original post. After doing any kind of rolling update to kubernetes redis cluster and pods are restarted we are getting Socket closed unexpectedly on clients and that is alright, but as explained above retry strategy never kicks in and cluster client ends in closed state. While restarting affected service will reconnect properly, we have temporary workaround in place that should hopefully end up in less downtime. I have pasted simplified example below.
This is rather serious issue that should get more attention as at the moment there is no real alternative then this library to connect to redis cluster in node.js environment (we have had even worse experience when it comes to redis cluster with ioredis when using kubernetes). |
I'm also observing this same behavior. |
I find out that version |
i am seeing the same behaviour in my cluster with 4.6.14 downgrading to 4.5.1 is sadly not an option: it does not seem to support hostname endpoints via testing with |
@leibale can you or one of the active maintainers take a look here? We use We would like to avoid building a workaround here. We use the library's client with near identical code for clusters and single instances and for single instances the reconnect works as expected. So any workaround will need some novel code separation. A workaround is also likely to clash with any future fixes in |
same issue - this PR is fixing the issue: #2783 |
I can confirm #2783 fixes the issue for us. |
|
Thanks for the quick release. Very much appreciated 👍 |
Broken again; fix got removed again. This PR fixes the issue again: #2870 |
Description
We use cluster-mode with redis for sharded pub-sub (we have 3 masters and 3 replicas in a kubernetes cluster).
We have the following args for the clients:
and then we create the client(s) like this:
Sometimes our redis pub-sub cluster goes down (i.e. for maintenance, when we upgrade to a new version, since we run it in kubernetes), and we'll receive the following error:
We correctly log the error by catching it in the error handler, but we never seem to retry / reconnect -- the only way I can get a reconnect to actually happen is to continually restart the process until the reconnection succeeds.
Also, if the process tries to issue a command, we sometimes get an internal error killing the process because of a node uncaught exception, even though I've added a
client.on('error')
above.I followed the findings from #2120 and #2302, but those don't really seem to solve our problems.
What I'd like is to be able to specify a reconnect strategy so that we continually try to retry (according to the
reconnectStrategy
) if we lose our TLS connection / fail to talk to a node in the cluster. Also, I'd like that we actually queue messages when we're offline instead of throwing an error and taking down the process.Node.js Version
20.11.1
Redis Server Version
7.0.10
Node Redis Version
4.6.13
Platform
linux
Logs
No response
The text was updated successfully, but these errors were encountered: