Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid allocations in SendTo and Quic #78591

Closed
wants to merge 7 commits into from
Closed

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Nov 19, 2022

Contributes to #30797

This is somewhat incomplete MVP. I'm looking for feedback before spending too much time.
We generally use public SocketAddress class (and internal variant) when we want to pass it to/from native network stack. For connection protocols like TCP allocation and pinning overhead is acceptable but for something like SendTo/ReceiveFrom we need to do it for every and it becomes significant. (and it also generates pauses when GC kicks in)

To address this PR brings native sockaddr to managed code. We already have that somewhat for Quic so I made some cleanup and changes and I also use it for Sockets.SendTo. For IPv4/6 we allocate struct on stack and for rest of address families we still serialize like we used to. (could be fixed easily)

The SendTo would typically finish synchronously as it only depends on Kernels ability to transmit (unlike TCP when we may need to wait for ACK) but we also have cases when the operation lands on async queue. For now, this change rents buffer from array pull and it would copy the data over. We could also make the new struct part of BufferPtrSendOperation (or variant) but I did not want to change this at this moment.
For reference the sizes of sock address on Linux are:
ipv4: 16, ipv6: 28 UnixDomainSocket = 110, Storage(maximum supported address) = 128

This part is easy for Quic as we only support operation on IP. For Quic I was also thinking about replacing the quicAddress but I like it as different type so we option for modifications if we need to.

@wfurt wfurt requested review from stephentoub, jkotas and a team November 19, 2022 06:18
@wfurt wfurt self-assigned this Nov 19, 2022
@ghost
Copy link

ghost commented Nov 19, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #30797

This is somewhat incomplete MVP. I'm looking for feedback before spending too much time.
We generally use public SocketAddress class (and internal variant) when we want to pass it to/from native network stack. For connection protocols like TCP allocation and pinning overhead is acceptable but for something like SendTo/ReceiveFrom we need to do it for every and it becomes significant. (and it also generates pauses when GC kicks in)

To address this PR brings native sockaddr to managed code. We already have that somewhat for Quic so I made some cleanup and changes and I also use it for Sockets.SendTo. For IPv4/6 we allocate struct on stack and for rest of address families we still serialize like we used to. (could be fixed easily)

The 'SendTowould typically finish synchronously as it only depends on Kernels ability to transmit (unlike TCP when we may need to wait for ACK) but we also have cases when the operation lands on async queue. For now, this change rents buffer from array pull and it would copy the data over. We could also make the new struct part ofBufferPtrSendOperation` (or variant) but I did not want to change this at this moment.
For reference the sizes of sock address on Linux are:
ip45: 16, ipv6: 28 UniXDomainSocket = 110, Storage(maximum supported address) = 128

This part is easy for Quic as we only support operation on IP. For Quic I was also thinking about replacing the quicAddress but I like it as different type so we option for modifications if we need to.

Author: wfurt
Assignees: wfurt
Labels:

area-System.Net.Sockets

Milestone: -

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this :shipit:

Span<byte> addressBytes = new Span<byte>((byte*)Unsafe.AsPointer(ref quicAddress), Internals.SocketAddress.IPv6AddressSize);
return new Internals.SocketAddress(addressFamilyOverride ?? SocketAddressPal.GetAddressFamily(addressBytes), addressBytes).GetIPEndPoint();
}

internal static unsafe QuicAddr ToQuicAddr(this IPEndPoint iPEndPoint)
Copy link
Member

@ManickaP ManickaP Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get rid off of both these helpers and replace them everywhere. Or keep them both in the simplified version and all their usages.

Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far

LocalEndPoint = address.ToIPEndPoint(options.ListenEndPoint.AddressFamily);
if (options.ListenEndPoint.Address.Equals(IPAddress.IPv6Any))
{
address.Address.Ipv6.SetAddressFamily((int)QUIC_ADDRESS_FAMILY_INET6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have now the platform-dependent AF_INET/6 constants in the Sys class, should we use them here?

Either way, I don't think the cast here is needed.

public uint sin6_scope_id;
}

[StructLayout(LayoutKind.Explicit)]
internal struct QuicAddr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readers could (incorrectly) assume that the code was written by the MsQuic team when in fact we did, and since the QuicAddr struct does not seem to add anything useful over SockAddr, I would prefer removing it and using SockAddr directly instead.


public static bool SockaddrHasLength => OperatingSystem.IsFreeBSD() || OperatingSystem.IsIOS() || OperatingSystem.IsMacOS() || OperatingSystem.IsMacCatalyst() || OperatingSystem.IsTvOS() || OperatingSystem.IsWatchOS();
public int Family => Address.Family;
public Interop.Sys.sockaddr_in6 Ipv6 => Address.Ipv6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a verbatim copy of https://github.com/microsoft/msquic/blob/main/src/cs/lib/msquic.cs from msquic repo. I do not think you want to make custom edits to it that make it depend on the dotnet runtime details. It will future make synchronization of this file harder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native QUIC_ADDR (e.g. source of truth) use sockaddr

https://github.com/microsoft/msquic/blob/cb9e250e13d9354ac0b325509ea2d7003294f3c6/src/inc/msquic_posix.h#L157-L161

Our sockets needs more than Quic and I did not want to drag the runtime check for SockaddrHasLength. Instead the proposed struct represents the OS layout. Problem with MsQuic is that it does not really have PAL at c# layer -> just split between Windows and POSIX.

We could drag the MsQuic definitions but use serialization from Sockets/sockaddr to produce the bits.

cc: @nibanks for additional thoughts.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should formulate a design we are happy to carry forward to get rid of Internals.SocketAddress entirely, refactoring ReceiveFrom, Connect and the async path.


if (NetEventSource.Log.IsEnabled()) NetEventSource.DumpBuffer(this, buffer, offset, size);
return bytesTransferred;
return SendTo(buffer.AsSpan(offset, size), socketFlags, remoteEP);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplication is here because of #46600. The byte[] buffer case has to land in a BufferMemorySendOperation in the sync-over-async case.

Comment on lines +381 to +386
if (SocketAddress != null)
{
ArrayPool<byte>.Shared.Return(SocketAddress);
SocketAddress = null;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks like there are some cases which do not end up calling InvokeCallback, leaking the pooled array.

BytesTransferred = bytesSent
};

if (socketAddress.Length > 0)
{
operation.SocketAddress = ArrayPool<byte>.Shared.Rent(socketAddress.Length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to return it in a finally block? In my understanding, this is a blocking call that should finish on the calling thread. (See #46600)

Comment on lines +16 to +19
[FieldOffset(0)]
public Interop.Sys.sockaddr_in Ipv4;
[FieldOffset(0)]
public Interop.Sys.sockaddr_in6 Ipv6;
Copy link
Member

@antonfirsov antonfirsov Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We historically deal in our native PAL with these kind of concerns. I don't know enough context to decide if it's safe and good for us to reimplement the mapping in managed code for all the platforms.

Instead of defining structures, we could just hold into a fixed byte data[IPv6AddressSize] and let SocketAddressPal to do the dirty work. I think the cost of the extra shared library function call would be negligible.

@stephentoub @tmds thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not just the call, it is also pinning

public static unsafe AddressFamily GetAddressFamily(ReadOnlySpan<byte> buffer)
{
AddressFamily family;
Interop.Error err;
fixed (byte* rawAddress = buffer)
{
err = Interop.Sys.GetAddressFamily(rawAddress, buffer.Length, (int*)&family);
}

for something that would be easy to do in managed code. I feel the sockaddr is unlikely to change for platforms and families we care about. If anything, I would keep code to get size of sockaddr_storage in native so we do not need to track addition of new protocols.

And we do it for Quic already anyway. I feel it would be nice if we can share more code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is also pinning

We can avoid it by introducing non-pinning overloads, since SockAddr is created on the stack, or is expected to be pre-pinned in async cases.

it would be nice if we can share more code.

System.Net.Quic can also adapt SocketAddressPal. We would end up having fewer LOC.

Copy link
Member

@tmds tmds Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the pinning cost worth the duplication?

If you're adding the mapping in managed code, then some existing users of {Get/Set}IPv{4,6}Address can probably also use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinning with fixed is close to free unless there's a GC that occurs concurrently, and that's not particularly likely with a really fast call like this one. Plus if the thing being pinned isn't actually on the heap, the cost during a GC that does occur is also minimal. I wouldn't duplicate a lot of code to avoid it.

@ghost
Copy link

ghost commented Jan 7, 2023

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 7, 2023
@karelz karelz added this to the 8.0.0 milestone Mar 22, 2023
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants