Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket closed unexpectedly #2276

Closed
richardsun2021 opened this issue Sep 29, 2022 · 27 comments · Fixed by #2321
Closed

Socket closed unexpectedly #2276

richardsun2021 opened this issue Sep 29, 2022 · 27 comments · Fixed by #2321
Labels

Comments

@richardsun2021
Copy link

richardsun2021 commented Sep 29, 2022

 this.redisClient = redis.createClient({
			url: `redis://${redisHost}:${redisPort}`,
			socket: {
				connectTimeout: 60000,
				keepAlive: 60000,
				reconnectStrategy: (attempts) => {
					logger.log(`Redis reconnecting attempt ${attempts}`);
					if (attempts == 1) {
						console.log(`${this.constructor.name} failed to connect to ${redisHost}:${redisPort}. Reconnecting...`);
					}
					return 500;
				},
			},
			password: ******
		});
		this.redisClient.on('error', err => logger.error(err, "Redis v4 client error"));
		this.redisClient.on('connect', () => logger.log('Redis v4 is connect'));
		this.redisClient.on('reconnecting', () => logger.log('Redis v4 is reconnecting'));
		this.redisClient.on('ready', () => logger.log('Redis v4 is ready'));
		this.redisClient.connect(); 

between my app service and redis service, there is an aws NLB-network load balancer.
I have four redis client created, and every 2 minutes will get this error:
"error: Socket closed unexpectedly"

Environment:

  • Node.js Version: "16.16"
  • Redis Server Version: "6.2"
  • Node Redis Version: "4.3.1"
  • Platform: aws Ecs
@abhi12299
Copy link

Same issue here.. running Azure Redis Cache with TLS results in random "Socket closed unexpectedly" errors every 5-10mins after starting my application (on AKS). This error also occurs when running the app locally on my machine (no proxies/load balancers).

@schmod
Copy link

schmod commented Oct 4, 2022

Try setting connectTimeout: false

I think this library is mis-using Node's Socket.setTimeout() method, and is tearing the connection down after any idle period of 60 seconds or more.

@yaroslavputria
Copy link

Try setting connectTimeout: false

@schmod, do you mean connectTimeout: 0? - socket.connectTimeout is number.

@richardsun2021
Copy link
Author

richardsun2021 commented Oct 10, 2022

it works fine for me locally, pre-production test found this issue.

Same issue here.. running Azure Redis Cache with TLS results in random "Socket closed unexpectedly" errors every 5-10mins after starting my application (on AKS). This error also occurs when running the app locally on my machine (no proxies/load balancers).

@abhi12299
Copy link

Switched to ioredis for now.. this library is not suitable for production use until the issue is fixed.

@richardsun2021
Copy link
Author

Switched to ioredis for now.. this library is not suitable for production use until the issue is fixed.

while 3.* is stable, but no bloom module, for me would like to switch to 3.* and implement bloom by other lib.

@shawnlknight
Copy link

I am also seeing this issue and will not upgrade to v4 until it is resolved. Any word on this?!?

@leibale
Copy link
Contributor

leibale commented Oct 25, 2022

Hi everyone, sorry for the dealy, I was on vacation..

I wasn't able to reproduce the error with "docker run redis/redis-stack-server", can you please help me reproduce the issue so I'll be able to fix it?

@schmod
Copy link

schmod commented Oct 25, 2022

I'll reiterate my previous comment about connectTimeout – the current implementation calls:

if (this.#options.connectTimeout) {
   socket.setTimeout(this.#options.connectTimeout, () => socket.destroy(new ConnectionTimeoutError()));
}

However, socket.setTimeout isn't a connection timeout – it's a timeout for any activity on the socket. Let it sit idle for too long, and it'll self-destruct.

From the Node docs:

Sets the socket to timeout after timeout milliseconds of inactivity on the socket. By default net.Socket do not have a timeout.

When an idle timeout is triggered the socket will receive a 'timeout' event but the connection will not be severed. The user must manually call socket.end() or socket.destroy() to end the connection.

I'm not sure if this is the cause of the issues described in this thread, but it's almost certainly a bug.

@leibale
Copy link
Contributor

leibale commented Oct 25, 2022

@schmod the timeout is reset to 0 after the connect event, see this line

@abhi12299
Copy link

abhi12299 commented Oct 26, 2022

@schmod for some context, I tried to ping the redis server in my health check endpoint, so that there's no inactivity (the health check endpoint is called every 5s). But the connection still terminates after a few minutes (~5-10mins).
This only happened when I connected to Azure Redis Cache and not when connecting to local redis instance.

PS, I also talked to Microsoft support team for the same but there wasn't any issue with the connection options and setup.

@leibale
Copy link
Contributor

leibale commented Oct 27, 2022

see #1598

@abhi12299
Copy link

@leibale I had already tried the ping approach, but this still doesn't work.. I guess there's some issue with the library because I've migrated to ioredis and it works perfectly fine. No drops whatsoever.

@leibale
Copy link
Contributor

leibale commented Nov 1, 2022

I've tried to reproduce the issue with "Azure Cache for Redis", but didn't manage to... if one of you can make a call with me and show me the issue that will be very helpful :)

edit:
I managed to reproduce it, will keep you posted

@leibale
Copy link
Contributor

leibale commented Nov 1, 2022

leibale added a commit to leibale/node-redis that referenced this issue Nov 2, 2022
@josh803316
Copy link

We are running into this too:

  • We have an Elasticache instance in AWS (Engine: 6.2.6, Instance Size: cache.t3.micro)
  • We are connecting to it via a Fargate node 18 NestJS service using "cache-manager": "^5.1.1","cache-manager-redis-store": "^3.0.1", and "redis": "^4.3.1"
  • For the most part the service runs fine, but every couple of days it crashes with the Socket closed unexpectedly error but then the worst part is that it seems to not recover and we have to restart the service to get it working properly again.
  • In our case the crash/socket hangup seems related to accessing a large JSON value (800KB) which could def be a big part of the issue, but would still like to understand how to make this resilient.

@leibale I'm not sure if it's related to the same problem, we have all the default client settings (i.e. None are set outside of the URL). What would you recommend for recovery/reconnect? Does it silently stop and this pingInterval fix will help?

@richardsun2021
Copy link
Author

@leibale could you please release this bug fix ASAP? we are pending for this feature.

@leibale
Copy link
Contributor

leibale commented Nov 8, 2022

@richardsun2021 in the meantime you can use this code to do (almost) the exact thing:

setInterval(client => {
  client.ping((err) => {
    if (err) console.error('Redis keepalive error', err);
  });
}, 9*1000*60);
// azure is closing the socket after 10m of idle time, pinging every 9m seems like a reasonable choice

@josh803316 it should reconnect after this error, do you have a listener to error (see #2302)?
AFAIK AWS honors TCP keep-alive, I don't think pingInteval will help in your case..

@josh803316
Copy link

josh803316 commented Nov 8, 2022

@leibale Yes, I do have an error listener which was how (I presume) I saw the error in the logs, but I've upgraded the instance size of the elasticache node and set listeners as follows and haven't seen the crash/issue since the change. My assumption is that the request for the large (almost 1MB) json key/value pair was taking more than the 5 second default timeout (on occasion) and so it would hit the socket timeout but then not recover... just don't know why it wouldn't recover/reconnect as you mentioned:

    const client = cacheManager.store.getClient();
    // Only had this before
    client.on('error', (error) => {
      this.logger.error(error);
    });
    ////////////////////////////////////////
    client.on('connect', () => {
      this.logger.error('client is connected');
    });
    client.on('reconnecting', () => {
      this.logger.error('client is reconnecting');
    });
    client.on('ready', () => {
      this.logger.error('client is ready');
    });

@leibale leibale mentioned this issue Nov 10, 2022
leibale added a commit that referenced this issue Nov 10, 2022
* fix #1598 fix #2276 - add `pingInterval` to client config

* setPingTimer on ready (instead of on connect)

* use isReady (instead of isOpen) and fix test

* Update client-configuration.md
@thorep
Copy link

thorep commented Jun 18, 2023

I am facing this exact issue now.. any solution?

@abhi12299
Copy link

@thorep if the solutions discussed above don't work, just use ioredis. Setting connection timeout to 0 didn't work for me, so I migrated to ioredis and it just works.

@leibale
Copy link
Contributor

leibale commented Jun 18, 2023

@thorep try to use pingInterval and set it to less than 5*60*1000

@emanuellarini
Copy link

@thorep try to use pingInterval and set it to less than 5*60*1000

I am using redis with elasticache from AWS
Your solution worked for me although I set it to 10 * 1000
When running locally it disconnected me completely after 20s so to be safer I think 10s it's fine, right?
Btw I also tried use ioredis and I didn't need to configure anything, just worked fine
Is there any consequences on adjusting that value? if not, i'll stick with node-redis, otherwise I think its best to move to ioredis

@leibale
Copy link
Contributor

leibale commented Nov 10, 2023

@emanuellarini pingInterval will send a PING command to the server to make sure the server will not close the socket. For example, pingInterval of 10 * 1000 will ping the server every 10s. The only consequence is more pings to the server..

You should check the timeout option (see here) you have set on your instance and then set the pingInterval to a value that is smaller than that by at least a few seconds.

If you want, we can have a quick call.. :)

@abdelhalim97
Copy link

@leibale thank u i can confirm: using redis in azure and setting pingInterval to 9 minutes fixes the problem
but i have a couple of questions for u if u dont mind

why the azure kills the connection if i m already using some redis operations on the database, shouldnt the 10 minutes cooldown get reseted every redis operation?

also whats the difference between pingInterval and keepAlive?
because i spent too much using keepAlive and wondering why it still crushes

@laocoi
Copy link

laocoi commented Jan 4, 2024

I got the same issue (Socket Closed Unexpectedly) but only with the remote Redis servers, so I figured out that maybe there is a problem with the TLS. So I tried to use redis:// instead of rediss://, and yes it was working stable.

@iampato
Copy link

iampato commented Nov 26, 2024

Shorter ping interval did the trick:

 pingInterval: 10 * 1000,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.