-
Notifications
You must be signed in to change notification settings - Fork 23
Use pipelines? #13
Comments
Yes 😄 |
Yes. |
I really hope and wish this can come true 👍🏻 In a production app (closed source for work) I see huge potential to further improve things just with such a change*. * I almost started to prototyp such a change, but didn't had the time for it. |
I got rid of a lot of primitives boxing in M.D.SqlClient, even for guids. Unless you're passing them through parameters they shouldn't be too bad. If you can provide a test repro that shows where they've boxing open an issue on M.D.SqlClient and I'll take a look. |
Grpc uses pipelines in asp.net core server and Stream in the client. |
@JamesNK am curious, was there a particular reason not to use pipelines in the client too? |
HttpClient uses Stream in its APIs. Pipelines would be another layer of abstraction that wouldn’t add any value. One of the nice things about pipelines is it manages buffer pooling for you. gRPC messages are prefixed by length so it isn’t difficult to rent a buffer of the right size yourself. |
Do we rent the exact length? That might not always be the best strategy... |
Currently yes. Agree that won't be best for big messages. Might look at using an |
SqlClient uses SslStream and in general uses stream as an abstraction because the supported transports are TCP and NamedPipes both of which have stream abstractions but no lower. If we decide not to support NamedPipes so that we can go down as far as sockets we'll need to find a way to support Ssl at the socket level (I haven't seen a way to do that). Even having Pipelines wrapping streams will have code level benefits even if it doesn't allow perfect array pooling. The TDS protocol has 3 layers to each packet and we might not want to deal with all the complexity at every level. |
Both sockets and named pipes are eventually accessed via Stream implementations (over which SslStream can be layered), so I'm not sure why supporting named pipes or not is important here.
I don't know anything about TDS/SMUX or anything like that, but when looking some time ago, I indeed got the impression that PIpelines are especially useful when the message length isn't known in advance (e.g. when reading HTTP data, which is newline-terminated rather than length-prefixed). I might be totally wrong about that - @davidfowl is the guy who knows. I'm not sure if the multiple layering of protocols within TDS is important here - if at some level there's the notion of a length-prefixed packet or frame, than at that level that can be read and buffered (if needed) even if within it other protocols are nested. I guess I'm asking whether we're sure that the benefits of Pipelines in the particular case of TDS are worth what we'd potentially pay for the additional I/O layer (including locking, IIRC). Again, @davidfowl is the guy :) |
There are complications but overall the benefits of pipelines are well worth any effort we have to put into using them. I think they fit well with the way we want to think about and work with the tds data. We don't want to think in packets we want to abstract that away and focus on what we're doing with the contents. I've written some code with pipelines to parse a text based network protocol and it's low level but well worth the perf benefits. In fact that project was what started me working on SqlClient. I wrote a really high performance implementation of the parsing and object layers and then found that the moment I started trying to access the database it absolutely destroyed perf and I thought I might be able to fix that. Still working on it 😀 |
If you're referring to TCP packets, then that's already abstracted away at a lower level by the OS - the Stream abstraction allows you to be oblivious of packets and treat everything as a linear stream of bytes. If you're referring to packets in TDS (or one of its subprotocols or something), then that's something that's going to have to happen in a TDS driver regardless, no? The only question is whether Pipelines is the technology you want to use or not.
Pipelines definitely does seem to shine for parsing text-based protocols; but here we're discussing a length-prefixed one, which is why I'm raising the question. |
Not quite. @Wraith2 hit the nail on the head with the use case for pipelines and this is the ideal solution, length prefixed or not (HTTP/2 isn't text based). See github.com/davidfowl/BedrockFramework/ if you want to see more scenarios. |
Nope. [MS-TDS]: Tabular Data Stream Protocol is a packetized binary protocol. It is sometimes contained within [MC-SMP]: Session Multiplex Protocol which is sometimes called SMUX. At login the client requests a max packet size and the server will respond with the max packet size it will allow. That negotiated packet size it will allow for the connection. Thereafter data sent from either side will be packed together in blocks no larger than that max packet size and only the final packet of a message is allowed to be smaller than that size. These TDS packets are typically 8K but can go up to almost 16 when ssl is used. Each packet will start with a small number of bytes for the SNUX header if it is in use, then the packet header, then the payload. Any data type larger than a byte can span multiple packet payloads with no additional metadata, you'll simply get 3 bytes at the end of one payload and then the next byte as the first in the next payload. Because of that structure what you really want to be able to do is request data from the payload(s) and not have to care about the mechanics of parsing out the smux or packet headers. I want to ask for 4 bytes and get an OperationStatus enum value back telling me if it succeeded and if not why not. In the existing SqlClient codebase you've got this: internal bool TryReadInt32(out int value)
{
TdsParser.ReliabilitySection.Assert("unreliable call to ReadInt32"); // you need to setup for a thread abort somewhere before you call this method
Debug.Assert(_syncOverAsync || !_asyncReadWithoutSnapshot, "This method is not safe to call when doing sync over async");
if (((_inBytesUsed + 4) > _inBytesRead) || (_inBytesPacket < 4))
{
// If the int isn't fully in the buffer, or if it isn't fully in the packet,
// then use ReadByteArray since the logic is there to take care of that.
if (!TryReadByteArray(_bTmp, 0, 4))
{
value = 0;
return false;
}
AssertValidState();
value = BitConverter.ToInt32(_bTmp, 0);
return true;
}
else
{
// The entire int is in the packet and in the buffer, so just return it
// and take care of the counters.
value = BitConverter.ToInt32(_inBuff, _inBytesUsed);
_inBytesUsed += 4;
_inBytesPacket -= 4;
AssertValidState();
return true;
}
} Where you can see that it's dealing with both data and packet semantics at the same level. What isn't obvious is that the With an input pipeline you'd reduce that complexity through proper abstraction and make each layer more usable readable and testable. We want that. I'm getting into a lot of detail but I've given this quite a lot of consideration of the last couple of years and I believe that pipelines are something that we want to use. We may not be able to use them at the IO layer because of the SSL requirements but even if we can't they're usable beyond that and I think we should. |
@Wraith2 it sounds to me like you want a Stream implementation over your multiple packets, which takes care of that layer and exposes the payload. The question is what happens in the payload, and whether messages in there are length-prefixed or not, etc. But it seems both @Wraith2 and @davidfowl are saying Pipelines are the way to go, so let's move on to something else. |
Once you've used it, it'll feel like a no brainer 😉 |
@Wraith2 Are you planning to try again with in-memory tables? I'd love to see initial or early focus of this effort be SqlBulkInsert to SQL in-memory tables. That scenario is currently a confusing mess, could use some love from .NET layer. Especially on allocations (DataTable is SOP, or object[] with boxed longs & doubles in them). :) Pointer to better place to discuss such scenarios? |
Open an issue with details and repro case on https://github.com/dotnet/SqlClient |
See #22 for some notes on usage of Pipelines in a database client such as Woodstar. |
There has been some discussion about whether use of data pipelines is the way to go for highly async, highly performant binary communication such as Tabular Data Stream (TDS) to and from SQL Server.
@davidfowl Thoughts? (I believe you discussed this with @roji already.)
@JamesNK We were wondering what gRPC uses?
The text was updated successfully, but these errors were encountered: