v4 - requests against connections with unexpectedly closed sockets don't get fulfilled after reconnect #1789

dhensby · 2021-12-17T22:36:26Z

I've recently upgraded from v3 to v4 and noticed that we're seeing client.get() calls that are hanging. I've not managed to pin-point the exact area where the problem is originating (compiled typescript doesn't make this easy) but we are noticing this in conjunction with connections that are being closed (presumably by the sever) after inactivity.

Cursory investigations appear to point to connections where the socket has been closed by the server (after ~10 minutes of inactivity) which are then resulting in a re-connection attempts (which is ultimately successful) but the queued get request is never being fulfilled.

I've remedied this by adding a ping keep-alive (as suggested here, though implemented in a slightly less naive way) and the problem has gone away (we're no longer seeing closed connections and no longer seeing unfulfilled requests).

This was not a problem in v3, the connection may timeout but a reconnect was made and the get call fulfilled successfully.

Possibly related issues:

setPingConnectionInterval for node-redis #1598 (a native ping-based keep-alive could help)
On node versions higher than 10.19.0, a socket disconnect can permanently break the internal command_queue state #1593 (this feels similar, but this issue is raised against v2 & v3, which we didn't see this behaviour on)

I'll spend a bit of time trying to reproduce this in a test, but ultimately I've papered over this issue by implementing a ping-based keep-alive for my client.

Other comments:

In v3 this situation (attempting to make a request when the underlying stream was closed) resulted in a reconnecting event and no error event. In v4 we are seeing no reconnecting event and we are seeing an error event. I'm not sure if this is an intended change, but it makes discerning genuine errors and errors that can be handled by the built in re-connection strategy harder.

Environment:

Node.js Version: v14.18.1
Redis Server Version: 4.0.14 (Azure Cache for Redis)
Node Redis Version: 4.0.1
Platform: Windows (Azure Function App, Node Runtime)

The text was updated successfully, but these errors were encountered:

leibale · 2021-12-19T02:38:09Z

@dhensby which version of npm are you using? because the project uses npm workspaces, which were introduced in npm v7

Regarding the issue itself - "When a socket closed unexpectedly, all the commands that were already sent will reject as they might have been executed on the server. The rest will remain queued in memory until a new socket is established. If the client is closed—either by returning an error from reconnectStrategy or by manually calling .disconnect()—they will be rejected." (https://github.com/redis/node-redis/blob/master/docs/FAQ.md#what-happens-when-the-network-goes-down)

dhensby · 2021-12-19T11:23:48Z

@leibale - thanks for the pointer, I upgraded NPM and it has built the tools now. I thought I had done this (as I noticed the github workflow doing it) but clearly not...

Regarding the issue:

What you cite is about the network going down (this issue isn't about the network going down but the socket closed unexpectedly situation which is the specific handling of network connections that have been closed by the server due to idle timeouts triggering on the server side).
The requests neither resolve nor reject, they do not fulfil at all as per the issue description. The promise "hangs" indefinitely even though reconnection is established successfully by the retry mechanism.

If what you cite is intended behaviour, then it's a pretty big regression from v3 and the logic that the request shouldn't be sent to the server because it may have already been fulfilled doesn't make sense if the socket was already closed before the request was sent and especially makes no sense for read operations (read operations could not have been fulfilled by the server, at leat not as far as the client/application is concerned, if the connection is down).

leibale · 2021-12-20T20:54:47Z

My bad, I've missed the "hanging" part... 🤦‍♂️

This is not the intended behavior, and I was thinking that this test covers this, but I guess I'm missing something.. do you have a reproduction of that bug?

dhensby · 2021-12-20T22:15:25Z

I'm not sure killClient is quite doing what we want to replicate the problem. It's a problem that happens on the underlying socket so if we want to truly replicate it we need the underlying socket to emit an error event (ECONNRESET aka SocketClosedUnexpectedlyError). Then it's down to the pipeline handling in the client (I'm guessing here) to establish a new connection and then send the queued commands though the new socket.

I've been battling a little to get the test suite running locally, but I can run it enough to try to create a test now. I just need a way to stub/force the socket to emit an error event as needed 🤔

dhensby · 2022-01-03T13:01:50Z

Hi @cit68. @leibale has written a subsequent comment indicating that the quote does not apply here.

This is a bug and there are some changes in behaviour since v3, I'm not clear on if they are intentional.

The main changes for when the connection has been reset (disconnected due to server-side timeout) then:

an error event is emitted (this didn't happen in v3 that I ever noticed).
no reconnecting event is emitted on the first reconnection attempt (this used to happen in v3).

This issue isn't pertaining to those changes as they could well be intentional (though my personal preference would be that there was no error event and there was a reconnecting event emitted, because I don't want to have to filter "false" errors from the error listener).

The issue here is that the request that triggers the reconnection logic in these circumstances never resolves.

cit68 · 2022-01-03T13:19:03Z

@leibale Sorry!!!
While replying to the comment I was removing mine

The reason is that I had put a wrong label in the "reconnect" handler so I thought it was not possible to have a successful pong and write / read command without any reconnection and created the previous comment.

But really there is an automatic reconnection before these commands, so my comment made no sense to exist.

Thanks for the reply.

But reading your point n°2
"no reconnecting event is emitted on the first reconnection attempt (this used to happen in v3)."
In my current test (v4.0.1) it's present reconnection attempt!

dhensby · 2022-01-03T13:24:41Z

But reading your point n°2
"no reconnecting event is emitted on the first reconnection attempt (this used to happen in v3)."
In my current test (v4.0.1) it's present reconnection attempt!

Ah, ok - Maybe I'm mis-remembering. I haven't looked at this for a couple of weeks.

dhensby added the Bug label Dec 17, 2021

This comment has been minimized.

Sign in to view

leibale added a commit that referenced this issue Jan 3, 2022

ref #1789 - reject "hanging" promises when closing connection

7e2059b

leibale closed this as completed Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4 - requests against connections with unexpectedly closed sockets don't get fulfilled after reconnect #1789

v4 - requests against connections with unexpectedly closed sockets don't get fulfilled after reconnect #1789

dhensby commented Dec 17, 2021 •

edited

Loading

This comment has been minimized.

leibale commented Dec 19, 2021 •

edited

Loading

dhensby commented Dec 19, 2021 •

edited

Loading

leibale commented Dec 20, 2021

dhensby commented Dec 20, 2021

dhensby commented Jan 3, 2022

cit68 commented Jan 3, 2022 •

edited

Loading

dhensby commented Jan 3, 2022

v4 - requests against connections with unexpectedly closed sockets don't get fulfilled after reconnect #1789

v4 - requests against connections with unexpectedly closed sockets don't get fulfilled after reconnect #1789

Comments

dhensby commented Dec 17, 2021 • edited Loading

This comment has been minimized.

leibale commented Dec 19, 2021 • edited Loading

dhensby commented Dec 19, 2021 • edited Loading

leibale commented Dec 20, 2021

dhensby commented Dec 20, 2021

dhensby commented Jan 3, 2022

cit68 commented Jan 3, 2022 • edited Loading

dhensby commented Jan 3, 2022

dhensby commented Dec 17, 2021 •

edited

Loading

leibale commented Dec 19, 2021 •

edited

Loading

dhensby commented Dec 19, 2021 •

edited

Loading

cit68 commented Jan 3, 2022 •

edited

Loading