Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketException with HttpClient #29038

Closed
snehashankar opened this issue Mar 21, 2019 · 18 comments
Closed

SocketException with HttpClient #29038

snehashankar opened this issue Mar 21, 2019 · 18 comments

Comments

@snehashankar
Copy link

I am currently working on a bug fix related to the HttpClient socket exhaustion issue. The SocketException message which I receive is as follows:
System.Net.Sockets.SocketException: Message 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', ErrorCode 10060, HResult -2147467259, NativeErrorCode 10060, SocketErrorCode TimedOut, Source 'System.Private.CoreLib'

This issue occurs at high load scenarios with dotnet core 2.2. I am trying hard to repro this since 2 weeks now with similar or higher load. But unfortunately, there’s no luck.

We are using HttpClientFactory and injecting the HttpClient.

I did see some similar issues reported on this platform, and I am pretty much following what looks correct.

Any pointers on this (and the repro) would be highly appreciable.

@karelz karelz changed the title SocketException with HttpClient in dotnet core 2.2 SocketException with HttpClient Mar 22, 2019
@karelz
Copy link
Member

karelz commented Mar 22, 2019

This may be hard issue to track (just setting expectations).
We will need some logging (assuming it already exists in .NET).
Would you be able to get rid of the HttpClientFactory? Do you have a way to reproduce it in production? Did you try to minimize and stress test the area which causes the problem?

Did you see similar issue in .NET Core 2.1? Or did you not use 2.1 at all?

@karelz
Copy link
Member

karelz commented Mar 22, 2019

Also, please link the similar issues you mentioned above.

@snehashankar
Copy link
Author

I can get rid of HttpClientFactory..but before that, I should be able to repro it with the same right :)
I am able to see it in production during heavy loads (although sometimes I also see it with normal or lesser loads).

I see it in both dotnet core 2.1 and 2.2

Similar issues are #28205 , #27232

@rmkerr
Copy link
Contributor

rmkerr commented Mar 22, 2019

Have you ruled out the possiblility that you are running into an actual resource exhaustion issue? The error code you're seeing comes from the underlying socket implementation (WinSock), and indicates that the connection request to the server was unable to be completed within 21 seconds.

The most common scenario here would be that the server or client is under high load, and cannot complete the connection within that timeframe.

Do you have any logs that indicate the server is actually responding to the connection request when this error occurs?

@snehashankar
Copy link
Author

So this is now seen to occur even under low load scenarios.

@rmkerr
Copy link
Contributor

rmkerr commented Mar 25, 2019

That's a potentially useful datapoint.

This isn't actionable without more information though. We need either a repro or a thorough set of logs that shows the issue really is in the sockets layer.

@snehashankar
Copy link
Author

snehashankar commented Mar 26, 2019

I am not able to repro it in dev/test environment. The issue seems to be visible only in production.

This is what I observe when the exception occurs:

2019-03-26 14:35:25.3017019	Error in RDWeb to RDBroker proxy: System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.RDInfra.Diagnostics.HttpClientExtensions.DiagnosticsApiHttpClientExtensions.SendAsync(HttpClient _httpClient, IDiagnosticsApi diagnosticsApi, RDComponent destinationComponent, String destinationApi, HttpRequestMessage message, CancellationToken cancellationToken, String accessToken, String deploymentSlot) in C:\agent\_work\2\s\src\Shared\Diagnostics\src\Microsoft.RDInfra.Diagnostics.HttpClientExtensions\DiagnosticsApiHttpClientExtensions.cs:line 73
   at Microsoft.RDInfra.Diagnostics.HttpClientExtensions.DiagnosticsApiHttpClientExtensions.GetAsync(HttpClient _httpClient, IDiagnosticsApi diagnosticsApi, RDComponent destinationComponent, String destinationApi, String requestUri, String accessToken, String deploymentSlot) in C:\agent\_work\2\s\src\Shared\Diagnostics\src\Microsoft.RDInfra.Diagnostics.HttpClientExtensions\DiagnosticsApiHttpClientExtensions.cs:line 39
   at Microsoft.RDInfra.PublishingService.Proxy.ProxyBase.GetRequest[T](String clientName, String requestPath) in C:\agent\_work\2\s\src\RDBroker\src\PublishingServiceProxy\ProxyBase.cs:line 84
2019-03-26 14:35:25.3025638	37eff752-1cc0-4e7f-b5cd-5db9e6aaee66	R1	RDWebRole	System.Net.Sockets.SocketException: Message 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', ErrorCode 10060, HResult -2147467259, NativeErrorCode 10060, SocketErrorCode TimedOut, Source 'System.Private.CoreLib'
2019-03-26 14:35:25.3254286	37eff752-1cc0-4e7f-b5cd-5db9e6aaee66	R1	RDWebRole	Error in RDWebController action FeedDiscoveryAsync: System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.RDInfra.Diagnostics.HttpClientExtensions.DiagnosticsApiHttpClientExtensions.SendAsync(HttpClient _httpClient, IDiagnosticsApi diagnosticsApi, RDComponent destinationComponent, String destinationApi, HttpRequestMessage message, CancellationToken cancellationToken, String accessToken, String deploymentSlot) in C:\agent\_work\2\s\src\Shared\Diagnostics\src\Microsoft.RDInfra.Diagnostics.HttpClientExtensions\DiagnosticsApiHttpClientExtensions.cs:line 73
   at Microsoft.RDInfra.Diagnostics.HttpClientExtensions.DiagnosticsApiHttpClientExtensions.GetAsync(HttpClient _httpClient, IDiagnosticsApi diagnosticsApi, RDComponent destinationComponent, String destinationApi, String requestUri, String accessToken, String deploymentSlot) in C:\agent\_work\2\s\src\Shared\Diagnostics\src\Microsoft.RDInfra.Diagnostics.HttpClientExtensions\DiagnosticsApiHttpClientExtensions.cs:line 39
   at Microsoft.RDInfra.PublishingService.Proxy.ProxyBase.GetRequest[T](String clientName, String requestPath) in C:\agent\_work\2\s\src\RDBroker\src\PublishingServiceProxy\ProxyBase.cs:line 84
   at Microsoft.RDInfra.PublishingService.Proxy.WebFeedProxy.GetTenantsForUserAsync(String userToken) in C:\agent\_work\2\s\src\RDBroker\src\PublishingServiceProxy\WebFeedProxy.cs:line 170
   at Microsoft.RDInfra.RDWeb.Models.CPub.CPubLibImpl.GetDiscoveryFeedAsync(String userToken) in C:\agent\_work\2\s\src\RDWeb\src\RDWeb\Models\CPub.cs:line 70
   at Microsoft.RDInfra.RDWeb.RDWebController.FeedDiscoveryAsync() in C:\agent\_work\2\s\src\RDWeb\src\RDWeb\Controllers\RDWebController.cs:line 142

2019-03-26 14:35:25.3254534	System.Net.Sockets.SocketException: Message 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond', ErrorCode 10060, HResult -2147467259, NativeErrorCode 10060, SocketErrorCode TimedOut, Source 'System.Private.CoreLib'

2019-03-26 14:35:25.3271083	An unhandled exception has occurred while executing the request.

@rmkerr
Copy link
Contributor

rmkerr commented Mar 26, 2019

That call stack doesn't look unusual to me. It is consistent with what you would see if an actual connection establish timeout were occuring, as mentioned in the first part of the exception message:

A connection attempt failed because the connected party did not properly respond after a period of time

We need to be able to rule out that case before doing a deeper investigation here. There are a few approaches you can take there. The best is obviously to provide a repro.

Alternatively you can try collecting network traces (e.g. wireshark or tcpdump) on the client that show the connection succeeding, or not being initiated at all. Be aware that the default configuration there captures all network traffic, which can potentially contain sensitive information in a production service.

@wfurt
Copy link
Member

wfurt commented Jun 16, 2019

Is this still problem @snehashankar? You mentioned "socket exhaustion issue" but do you have any evidence?

#29327 and #28582 fail with different symptoms but they may give you some good background reading.

@sathishTkumar
Copy link

am facing the same exception. Please see my below and let me know if you need any additional info.
Environment: P2V2 with 20 instances
.net core v: 2.2
EXCEPTIONSystem.Net.Sockets.SocketException:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

startup.cs
services.AddScoped<IHttpHelper, HttpClientHelper>();

        services.AddHttpClient("client")                
            .ConfigurePrimaryHttpMessageHandler(() =>
            {
                return new HttpClientHandler { MaxConnectionsPerServer = 1000 };
            });

httpclientHelper.cs
_client = client.CreateClient("client");
_client.DefaultRequestHeaders.ConnectionClose = false;
_client.Timeout = 120;
HttpRequestMessage httpRequestMessage = new HttpRequestMessage(new HttpMethod("POST"), "https://api/abc");
httpRequestMessage.Content = httpContent;
HttpResponseMessage response = await _client.SendAsync(httpRequestMessage);
var responseContent = await response.Content.ReadAsStringAsync();
image

@davidsh
Copy link
Contributor

davidsh commented Jun 17, 2019

@snehashankar

_client.DefaultRequestHeaders.ConnectionClose = false;

Why are you adding this code to your project? This will cause the server to close the connection after a single request/response cycle. That will contribute to sockets getting closed all the time and thus running out of available sockets more quickly (since it takes longer for a socket to be ready to be re-opened after being closed).

Also, is this running on Windows or Linux on Azure?

@wfurt
Copy link
Member

wfurt commented Jun 17, 2019

You should also collect Wireshark packet dump to see that that the server did or did not respond. If it didn't this exception is correct and there is nothing wrong with HttpClient.

@sathishTkumar
Copy link

@snehashankar

_client.DefaultRequestHeaders.ConnectionClose = false;

Why are you adding this code to your project? This will cause the server to close the connection after a single request/response cycle. That will contribute to sockets getting closed all the time and thus running out of available sockets more quickly (since it takes longer for a socket to be ready to be re-opened after being closed).

Also, is this running on Windows or Linux on Azure?

This is running on Azure AppService P2V2 plan with 20 instances
As per this https://stackoverflow.com/questions/47411524/how-to-prevent-httpclient-from-sending-the-connection-header Stackoverflow comment, i added that line to enable Keep-Alive which is reduced the error counts 25% to 5%.
HttpRequestHeaders.ConnectionClose = true => Connection: close
HttpRequestHeaders.ConnectionClose = false => Connection: Keep-Alive
HttpRequestHeaders.ConnectionClose = null => Connection: Keep-Alive
HttpWebRequest.KeepAlive = true => Connection: Keep-Alive
HttpWebRequest.KeepAlive = false => Connection: close

@davidsh
Copy link
Contributor

davidsh commented Jun 18, 2019

HttpRequestHeaders.ConnectionClose = false => Connection: Keep-Alive

You're right, sorry. I mis-read that line thinking it was turning off 'Keep-Alive'.

@davidsh
Copy link
Contributor

davidsh commented Jun 18, 2019

@snehashankar

i added that line to enable Keep-Alive which is reduced the error counts 25% to 5%.

Using 'Keep-Alive' semantics is on by default for HTTP/1.1. It isn't strictly necessary to include any 'Connection: Keep-Alive' request header when using HTTP/1.1. It is assumed to be 'Keep-Alive' by default.

So, I'm curious why adding this request header changed anything for you. Is it because your HTTP requests are not HTTP/1.1 by default? Are you going through an HTTP/1.0 proxy perhaps?

@sathishTkumar
Copy link

But, i can see the huge difference after adding the Keep-Alive header. Earlier i was getting more socket exceptions and the cpu usage was so high which are reduced now.

Am using HTTP/1.1 only

image

@davidsh davidsh removed their assignment Sep 4, 2019
@karelz
Copy link
Member

karelz commented Oct 8, 2019

Triage: Looks like we do not have actionable info. Closing.
Feel free to reopen if there is something actionable.

@karelz karelz closed this as completed Oct 8, 2019
@stankovski
Copy link

We have seen a similar behavior in our service. Upon further investigation we have observed that under load HttpClient was using up all available sockets. That's related to the change between netfx and netcore where in netfx HttpClient was using ServicePointManager.DefaultConnectionLimit to limit number of connections to an endpoint (which was 2 or 10 by default). In netcore however, the ServicePointManager.DefaultConnectionLimit is only honored by WinHttpHandler. By default, however, both SocketHttpHandler and HttpClientHandler use MaxConnectionsPerServer property on the handler itself, which defaults to Int32.MaxValue, to control max number of connections to an endpoint. As a results, unless you either use WinHttpHandler or update MaxConnectionsPerServer property, your HttpClient will end up throwing SocketException under heavy usage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants