Clarify why we want TLSv1.3 #37

cpaasch · 2022-03-23T18:36:57Z

Feedback from Bjorn Mork (https://lists.bufferbloat.net/pipermail/rpm/2022-March/000165.html)

We should explain why TLSv1.3 is required.

richb-hanover · 2022-03-23T21:26:45Z

It would also be good to describe whether the certificates for TLSv1.3 could be self-signed, or must be signed by a CA. I know this was discussed a while back (on the RPM list?) but don't remember the outcome.

If the spec permits an RPM client to accept an unsigned certificate (perhaps noting the server certificate is unsigned), implementing an RPM server becomes a lot easier if it's not necessary to figure out all the Let's Encrypt machinery for my 192.168.1.1 router at home. Thanks.

hawkinsw · 2022-03-23T22:20:02Z

Feedback from Bjorn Mork (https://lists.bufferbloat.net/pipermail/rpm/2022-March/000165.html)

We should explain why TLSv1.3 is required.

After we talked about this, I 100% agree that a description of why we want TLS 1.3 would be great to have in the draft.

cpaasch · 2022-03-23T23:11:02Z

It would also be good to describe whether the certificates for TLSv1.3 could be self-signed, or must be signed by a CA. I know this was discussed a while back (on the RPM list?) but don't remember the outcome.

If the spec permits an RPM client to accept an unsigned certificate (perhaps noting the server certificate is unsigned), implementing an RPM server becomes a lot easier if it's not necessary to figure out all the Let's Encrypt machinery for my 192.168.1.1 router at home. Thanks.

This should be left to the implementation's choice. It is not part of the wire-format or the methodology. Ultimately, there is no difference between a self-signed or a CA-signed cert. The only difference is whether the client decides to trust the root or not.

LPardue · 2022-04-19T17:04:35Z

So as far as justification, at the time of writing this spec, it seems fine to rationalize it as the majority of Web traffic is using modern TLS - just like the use of HTTP/2 is rationalized. This spec shouldn't limit itself to TLS 1.3, to avoid preventing future changes. But is doesn't need to worry itself with older versions or tie itself in knots coming up with rationale.

For instance. According to figures on radar.cloudflare.com today, globally over the last 7 days 67% of traffic is using TLS 1.3, and 25 % is using QUIC's handshake (based on TLS 1.3). The remainder uses TLS 1.2.

cpaasch · 2022-04-19T23:47:29Z

Good point @LPardue - we should say that TLS 1.3 and later should be used.

The reason why we end up fixating on TLS 1.3 is because it is possible to easily count the rounds with TLS 1.3. We know that the handshake is always 1 round-trip. With TLS 1.2, it may be 1 or 2 round-trips.

cpaasch · 2022-06-24T23:40:57Z

As I have been iterating over this, I am trending towards actually not caring about TLS handshake latency. Here is my reasoning:

Nowadays, we try to reuse connections as much as possible. TLS handshake latency is not relevant then. Also, we are focusing on measuring responsiveness on the load-generating connections. Thus, again TLS handshake latency is irrelevant.

Also, by reusing connections for the latency-probes it is possible to send the probes continuously from the start of the load-generation. I am able to get much more data-points that way and thus a more stable result.

So, I tend to remove the notion of TLS-probes from the draft. Any strong opinions?

richb-hanover · 2022-06-25T01:12:40Z

A bunch of semi-related thoughts come to mind:

I don't know whether measuring TLS handshake latency is important
To the extent that the algorithm creates no/relatively few new TLS connections, their latency seems unimportant. To the extent that multiple connections simulates real-world use, it could matter.
I seem to remember that TLS handshakes require two round-trips which would make it more sensitive to latency than other metrics.
Measurement is hard. We're aiming for an algorithm that's sensitive (that is, it correctly detects the presence or absence of latency) and specific (the most likely explanation for the result is overbuffering in the path to the server). To the extent that measuring TLS latency aids either of those goals, we should think about keeping it.
Laying out a stable spec is hard, too. Changing the spec might invalidate earlier results. It's not much fun to be reviewing results, and have to ask whether both the client and server ends are newer than commit #1234567.
But it may not matter much in practice. Current RPM measurements are more descriptive than numerically accurate (low, medium, high vs a number with two decimal places). I regularly see a factor of 2-3 between the worst and best RPM values.
We shouldn't be afraid to change the spec if we can devise a better algorithm. There are millions and millions of people who'll use the RPM Test in the future, and we should get it right as early as possible. (I am reminded of the likely apocryphal 'roff' story, where they created this terrific text formatting package in the early days of Unix. After using it for a while, they came up with a new (better) command syntax but decided not to implement it because they "already had a dozen users...")
I don't know (See Item 1 above)

cpaasch · 2022-06-27T15:35:22Z

I don't know whether measuring TLS handshake latency is important

Depends on the goal, I guess :) I think for "responsiveness under working conditions" it is less important.

To the extent that the algorithm creates no/relatively few new TLS connections, their latency seems unimportant. To the extent that multiple connections simulates real-world use, it could matter.

The question becomes whether we expect latencies to be different for TLS vs H2 req/resp. And that depends entirely on the network (e.g., a transparent TCP proxy inspecting the TLS client-hello's SNI could end up delaying TLS quite a bit).

I seem to remember that TLS handshakes require two round-trips which would make it more sensitive to latency than other metrics.

It depends on the TLS version. Which is why we mandate(d) 1.3 (see my comment above).

Measurement is hard. We're aiming for an algorithm that's sensitive (that is, it correctly detects the presence or absence of latency) and specific (the most likely explanation for the result is overbuffering in the path to the server). To the extent that measuring TLS latency aids either of those goals, we should think about keeping it.

Yes, looking at it from the "Transparent TCP proxy" perspective which are very popular in cellular networks, it makes sense to measure TLS.

Laying out a stable spec is hard, too. Changing the spec might invalidate earlier results. It's not much fun to be reviewing results, and have to ask whether both the client and server ends are newer than commit #1234567.

Agreed. But the method was never said to be "the universal and only way to measure latency". I was/still am hoping that we are going to converge.

But it may not matter much in practice. Current RPM measurements are more descriptive than numerically accurate (low, medium, high vs a number with two decimal places). I regularly see a factor of 2-3 between the worst and best RPM values.

Variance has always been a problem. I am trying to increase the sample-size without causing the test to run much longer. Right now, I seem to achieve that goal, by starting the measurement not at the moment we reach saturation but rather sending a probe every 100ms from the beginning on. I then take the 90th percentile and the average among the latency on the load-generating and the separate connections. Numbers are fairly stable now.

If we want to bring TLS into that, it would mean that instead of reusing connections for my probes on the separate connections, I would create new ones. I will experiment with that.

We shouldn't be afraid to change the spec if we can devise a better algorithm. There are millions and millions of people who'll use the RPM Test in the future, and we should get it right as early as possible. (I am reminded of the likely apocryphal 'roff' story, where they created this terrific text formatting package in the early days of Unix. After using it for a while, they came up with a new (better) command syntax but decided not to implement it because they "already had a dozen users...")

Agreed, we need to converge as early as possible. macOS Ventura will have the latest and greatest.

I don't know (See Item 1 above)

cpaasch · 2022-06-27T15:36:12Z

As I read my reply to you, @richb-hanover I realize how at the beginning I was mostly convinced that dropping TLS is good and then moved more and more towards keeping TLS ;-)

As you can see, I'm quite split on this 😅

richb-hanover · 2022-06-27T16:40:51Z

As you can see, I'm quite split on this

And I'm always happy to help muddy the waters here :-) Thanks.

cpaasch · 2022-07-06T21:07:45Z

After more discussions and experimentations, it is best to keep the full handshake. Weighting of the values is still an open question. With the current approach, we get 4 sets of data: separate_tcp, separate_tls, separate_h2, load_generating_h2. For the 2 types of probes. The ones on separate connections (for these we get TCP, TLS and H2 data), and the load-generating connections (for this one we only get H2 data).

From these 4 data-sets we take the 90th percentile. So, we have 4 values separate_tcp_p90, separate_tls_p90, separate_h2_p90, load_generating_h2_p90.

Suggestion would be to average these in the following way:

Responsiveness = (1/3*separate_tcp_p90 + 1/3*separate_tls_p90 + 1/3*separate_h2_p90 + load_generating_h2_p90) / 2

We can also increase the weight towards H2:

Responsiveness = (1/6*separate_tcp_p90 + 1/6*separate_tls_p90 + 2/3*separate_h2_p90 + load_generating_h2_p90) / 2

cpaasch · 2022-07-11T20:03:32Z

af69ae2 please reopen if that's not sufficient. I removed the reference to TLS v1.3 and rather explain what the TLS-handshake latency is. It is a calculation of latency per round-trip during the TLS handshake phase. If the TLS-version being used requires 2 round-trips before the client can transmit data, then the latency needs to be divided by 2.

richb-hanover mentioned this issue Mar 24, 2022

Add instructions for generating certs network-quality/server#6

Open

This was referenced Jun 24, 2022

Clarify definition of probe and RTT aggregation methodology (section 4.2) #38

Closed

DNS client behaviour is under-defined #43

Closed

TCP and TLS handshakes are possibly under-defined #44

Open

cpaasch closed this as completed Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify why we want TLSv1.3 #37

Clarify why we want TLSv1.3 #37

cpaasch commented Mar 23, 2022

richb-hanover commented Mar 23, 2022 •

edited

Loading

hawkinsw commented Mar 23, 2022

cpaasch commented Mar 23, 2022

LPardue commented Apr 19, 2022

cpaasch commented Apr 19, 2022

cpaasch commented Jun 24, 2022

richb-hanover commented Jun 25, 2022 •

edited

Loading

cpaasch commented Jun 27, 2022

cpaasch commented Jun 27, 2022

richb-hanover commented Jun 27, 2022

cpaasch commented Jul 6, 2022

cpaasch commented Jul 11, 2022

Clarify why we want TLSv1.3 #37

Clarify why we want TLSv1.3 #37

Comments

cpaasch commented Mar 23, 2022

richb-hanover commented Mar 23, 2022 • edited Loading

hawkinsw commented Mar 23, 2022

cpaasch commented Mar 23, 2022

LPardue commented Apr 19, 2022

cpaasch commented Apr 19, 2022

cpaasch commented Jun 24, 2022

richb-hanover commented Jun 25, 2022 • edited Loading

cpaasch commented Jun 27, 2022

cpaasch commented Jun 27, 2022

richb-hanover commented Jun 27, 2022

cpaasch commented Jul 6, 2022

cpaasch commented Jul 11, 2022

richb-hanover commented Mar 23, 2022 •

edited

Loading

richb-hanover commented Jun 25, 2022 •

edited

Loading