Prose of "generating a random UUID" could be simplified #29

marcoscaceres · 2021-08-20T04:59:35Z

For "Generating a random UUID", I wonder if we can rewrite the section to either:

drop the section entirely, just hook directly into RFC4122 (ideally). If things need to be fixed or better worded, we should update RFC4122 instead. This would be my preference, as I'm concerned about duplication with RFC4122.
Or, rewrite the algorithm such that it's more like this:

Let |time-low:string| be ...
Let |time-mid:string| be...
Let |time-high-and-version:string| be..
... other parts....
Let |parts| be [=list=] « time-low, time-mid... ».
Let |uuid:string| be [=string/Concatenate=] |parts| using separator "-".
Return |uuid|.

domenic · 2021-08-20T15:43:46Z

I find the section as it is much clearer than the RFC or your proposed revision. Could you outline what's wrong with it (besides "clunky") before suggesting changes?

marcoscaceres · 2021-08-21T00:44:22Z

By clunky I mean the amount of redundancy here:

Do we really need to say "hexadecimal representation" ~17 times like the above? It's not clear which part is which (why I proposed naming each path clearly) - and it's super hard to read.

It's also clunky to sprinkle "-" throughout the [=list=], when [=string/concatenate=] supports joining with a separator "-".

marcoscaceres · 2021-08-21T05:11:05Z

@domenic, how about we turn "hexadecimal representation" into an algorithm either the array, a start index and an end index? That way we end up with must more readable:

Let |whatever:string| be [=hexadecimal representation=] of |array| from 0 to 3.
Let |other-part:string| be [=hexadecimal representation=] of |array| from 4 to 5.
...
Let |other-part:string| be [=hexadecimal representation=] of |array| from 10 to 15.
Let |parts| be [=list=] « whatever, other-part... ».
Let |uuid:string| be [=string/Concatenate=] |parts| using separator "-".
Return |uuid|.

domenic · 2021-08-23T17:50:00Z

That seems pretty weird to me. The hexadecimal representation of a number is a sensible thing to talk about; the hexadecimal representation of an array doesn't make any sense to me.

I think being explicit is better than being concise here.

bcoe · 2021-08-23T20:56:30Z

Perhaps:

Let |node:string| be [=hexadecimal representation=] of each |array| element from 10 to 15.

domenic · 2021-08-23T21:08:22Z

Hmm. Maybe something more like a map operation? "Let endHexes be the list formed by taking the hexadecimal representation of items 10 through 15, inclusive, in array"? Then "Let end be the result of concatenating endHexes"?

I still think that's worse than the current spec, because again, I think being explicit is really nice and simple and matches the code. It makes it clear that the structure is 4 hex representations, a dash, 2 hex representations, a dash, ...

But if the goal is to minimize the number of times you say "hexadecimal representation", then maybe something like the above would be the way to go.

We could also wait to see if anyone implementing the spec is confused by the current wording before changing anything.

marcoscaceres · 2021-08-23T23:02:15Z

I still think that's worse than the current spec, because again, I think being explicit is really nice and simple and matches the code.

By code I guess you mean WTF/UUID.cpp's createCanonicalUUIDString().

But if the goal is to minimize the number of times you say "hexadecimal representation", then maybe something like the above would be the way to go.

The goal is to be explicit, identify the parts, and to increase legibility by reducing redundancy in the text.

We could also wait to see if anyone implementing the spec is confused by the current wording before changing anything.

Because UUID generation has been in browsers for a really long time, and is fairly well-understood, it's extremely unlikely anyone is going to implement directly from what is in the spec text (it's the other way around, I think: the spec matches some implementation, which seems to be WTF/UUID). Thus, the spec text is mostly for us spec folks (and for developers) to have some clarity to what the parts are - which is the semantic aspects of UUIDs. This is also why I suggested initially we just defer to RFC4122.

Consider, aside from the WebIDL, the entire WebKit and Chromium implementations are 4 lines:

#include <wtf/UUID.h>

String Crypto::randomUUID() const
{
 return createCanonicalUUIDString();
};

With createCanonicalUUIDString() predating this effort. What's nice about createCanonicalUUIDString() is how it uses hex() as a function (which matches what is being suggested above):

    // Format as Version 4 UUID.
    return makeString(
        hex(randomData[0], 8, Lowercase),
        '-',
        hex(randomData[1] >> 16, 4, Lowercase),
        "-4",
        hex(randomData[1] & 0x00000fff, 3, Lowercase),
        '-',
        hex((randomData[2] >> 30) | 0x8, 1, Lowercase),
        hex((randomData[2] >> 16) & 0x00000fff, 3, Lowercase),
        '-',
        hex(randomData[2] & 0x0000ffff, 4, Lowercase),
        hex(randomData[3], 8, Lowercase)
    );

In the case of Mozilla, the new UUID Generator they are building looks like the following. The effort is being done independently of exposing this method:

NS_IMETHODIMP
nsUUIDGenerator::GenerateUUID(nsID** aRet) {
  nsID* id = static_cast<nsID*>(moz_xmalloc(sizeof(nsID)));

  nsresult rv = GenerateUUIDInPlace(id);
  if (NS_FAILED(rv)) {
    free(id);
    return rv;
  }

  *aRet = id;
  return rv;
}

nsUUIDGenerator::GenerateUUIDInPlace(nsID* aId) {
  // Fill the nsID with 16 random bytes.
  uint64_t randomNumbers[2]{RandomUint64OrDie(), RandomUint64OrDie()};
  static_assert(sizeof(nsID) == sizeof(randomNumbers),
                "nsID must be exactly 16 bytes");
  memcpy(aId, &randomNumbers, sizeof(nsID));

  // Put in the version
  aId->m2 &= 0x0fff;
  aId->m2 |= 0x4000;

  // Put in the variant
  aId->m3[0] &= 0x3f;
  aId->m3[0] |= 0x80;

  return NS_OK;
}

Note the " // Put in the version" and "// Put in the variant", which again goes to my point about labelling the parts as variables.
Note also that nsID is a four part struct specifically to match the UUID format.

My point being, there are multiple ways of doing this that end up with something pretty clear.

ctavan · 2021-08-24T10:27:30Z

I raised my concerns with using UUID "semantics" (i.e. time-{low,mid,high} etc.) in this comment and I still think it doesn't make sense to keep carrying around this historic notation for anything describing non-v1-UUIDs. In my opinion it's an unfortunate historic artifact that RFC4122 continued to use these semantics when introducing random (v4) and name-based (v3/v5) UUIDs.

marcoscaceres · 2021-08-24T14:11:23Z

See the current pull request. It’s a good improvement IMO.

bcoe · 2021-08-24T15:41:36Z

@marcoscaceres @ctavan @domenic I'm okay of trying to trim down the verbosity of the algorithm in spec text, but I don't think we've quite landed on format that pleases everyone. I'd like to close #33 for now, to unblock #34, and revisit this issue later.

Sound reasonable?

marcoscaceres · 2021-08-25T01:03:00Z

Sounds ok to me... but yeah, let's please come back to it. The WebCrypto spec has a bad rep for being a bit developer unfriendly and unapproachable (it's a difficult topic! not just the spec's fault) - so, speaking as a developer, someone on a devrel team, and an implementer, I think little editorial things like this really do help.

bcoe · 2021-08-26T14:58:04Z

@ctavan any interest on giving this rewrite a go, you're the subject matter expert at real world implementations of the algorithm 😄

bcoe · 2021-11-26T15:02:24Z

@twiss are you happy with where we landed on, with the algorithm spec text? I'd like to close this issue if we're in a good place.

twiss · 2021-11-26T16:52:47Z

I'm personally not opposed to the current text. It would be nice to normatively refer to an RFC, but RFC4122 is quite difficult to follow since it also defines other versions, not just v4. If it makes people happier, I think one alternative would be to move the reference to RFC4122 out of the note, and move the current algorithm into a note (i.e. swap which one is normative). That being said, I'm not sure if that would make it easier to read, so I'm also fine with leaving it as is.

For what it's worth, the Web Crypto spec says that

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

so once it's merged there, an implementation would also be allowed to generate a v4 UUID in any other way, without necessarily having to follow the exact steps written here.

bcoe · 2021-11-28T19:30:43Z

I'm personally not opposed to the current text.

👍 I'm going to go ahead and close this ticket then, we can iterate in the upstream spec in the future if we like.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner

Thanks for pointing out this caveat. In practice it seems like most folks implement UUID v4 with various optimizations that would look pretty ugly in spec text (buffers of random bytes, bit shifting tricks).

marcoscaceres changed the title ~~Generating a random UUID is a little bit clunky~~ Generating a random UUID section is a little bit clunky Aug 20, 2021

bcoe mentioned this issue Aug 21, 2021

tweaks to UUID algorithm #33

Closed

bcoe mentioned this issue Aug 23, 2021

Provide snapshot of generated spec on gh-pages #32

Closed

marcoscaceres changed the title ~~Generating a random UUID section is a little bit clunky~~ Prose of "generating a random UUID" could be simplified Aug 25, 2021

bcoe closed this as completed Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prose of "generating a random UUID" could be simplified #29

Prose of "generating a random UUID" could be simplified #29

marcoscaceres commented Aug 20, 2021 •

edited

Loading

domenic commented Aug 20, 2021

marcoscaceres commented Aug 21, 2021

marcoscaceres commented Aug 21, 2021

domenic commented Aug 23, 2021

bcoe commented Aug 23, 2021

domenic commented Aug 23, 2021

marcoscaceres commented Aug 23, 2021

ctavan commented Aug 24, 2021

marcoscaceres commented Aug 24, 2021

bcoe commented Aug 24, 2021

marcoscaceres commented Aug 25, 2021

bcoe commented Aug 26, 2021

bcoe commented Nov 26, 2021

twiss commented Nov 26, 2021 •

edited

Loading

bcoe commented Nov 28, 2021 •

edited

Loading

Prose of "generating a random UUID" could be simplified #29

Prose of "generating a random UUID" could be simplified #29

Comments

marcoscaceres commented Aug 20, 2021 • edited Loading

domenic commented Aug 20, 2021

marcoscaceres commented Aug 21, 2021

marcoscaceres commented Aug 21, 2021

domenic commented Aug 23, 2021

bcoe commented Aug 23, 2021

domenic commented Aug 23, 2021

marcoscaceres commented Aug 23, 2021

ctavan commented Aug 24, 2021

marcoscaceres commented Aug 24, 2021

bcoe commented Aug 24, 2021

marcoscaceres commented Aug 25, 2021

bcoe commented Aug 26, 2021

bcoe commented Nov 26, 2021

twiss commented Nov 26, 2021 • edited Loading

bcoe commented Nov 28, 2021 • edited Loading

marcoscaceres commented Aug 20, 2021 •

edited

Loading

twiss commented Nov 26, 2021 •

edited

Loading

bcoe commented Nov 28, 2021 •

edited

Loading