-
Notifications
You must be signed in to change notification settings - Fork 10
Prose of "generating a random UUID" could be simplified #29
Comments
I find the section as it is much clearer than the RFC or your proposed revision. Could you outline what's wrong with it (besides "clunky") before suggesting changes? |
By clunky I mean the amount of redundancy here: Do we really need to say "hexadecimal representation" ~17 times like the above? It's not clear which part is which (why I proposed naming each path clearly) - and it's super hard to read. It's also clunky to sprinkle "-" throughout the [=list=], when [=string/concatenate=] supports joining with a separator "-". |
@domenic, how about we turn "hexadecimal representation" into an algorithm either the array, a start index and an end index? That way we end up with must more readable:
|
That seems pretty weird to me. The hexadecimal representation of a number is a sensible thing to talk about; the hexadecimal representation of an array doesn't make any sense to me. I think being explicit is better than being concise here. |
Perhaps:
|
Hmm. Maybe something more like a map operation? "Let endHexes be the list formed by taking the hexadecimal representation of items 10 through 15, inclusive, in array"? Then "Let end be the result of concatenating endHexes"? I still think that's worse than the current spec, because again, I think being explicit is really nice and simple and matches the code. It makes it clear that the structure is 4 hex representations, a dash, 2 hex representations, a dash, ... But if the goal is to minimize the number of times you say "hexadecimal representation", then maybe something like the above would be the way to go. We could also wait to see if anyone implementing the spec is confused by the current wording before changing anything. |
By code I guess you mean WTF/UUID.cpp's
The goal is to be explicit, identify the parts, and to increase legibility by reducing redundancy in the text.
Because UUID generation has been in browsers for a really long time, and is fairly well-understood, it's extremely unlikely anyone is going to implement directly from what is in the spec text (it's the other way around, I think: the spec matches some implementation, which seems to be WTF/UUID). Thus, the spec text is mostly for us spec folks (and for developers) to have some clarity to what the parts are - which is the semantic aspects of UUIDs. This is also why I suggested initially we just defer to RFC4122. Consider, aside from the WebIDL, the entire WebKit and Chromium implementations are 4 lines: #include <wtf/UUID.h>
String Crypto::randomUUID() const
{
return createCanonicalUUIDString();
}; With // Format as Version 4 UUID.
return makeString(
hex(randomData[0], 8, Lowercase),
'-',
hex(randomData[1] >> 16, 4, Lowercase),
"-4",
hex(randomData[1] & 0x00000fff, 3, Lowercase),
'-',
hex((randomData[2] >> 30) | 0x8, 1, Lowercase),
hex((randomData[2] >> 16) & 0x00000fff, 3, Lowercase),
'-',
hex(randomData[2] & 0x0000ffff, 4, Lowercase),
hex(randomData[3], 8, Lowercase)
); In the case of Mozilla, the new UUID Generator they are building looks like the following. The effort is being done independently of exposing this method: NS_IMETHODIMP
nsUUIDGenerator::GenerateUUID(nsID** aRet) {
nsID* id = static_cast<nsID*>(moz_xmalloc(sizeof(nsID)));
nsresult rv = GenerateUUIDInPlace(id);
if (NS_FAILED(rv)) {
free(id);
return rv;
}
*aRet = id;
return rv;
}
nsUUIDGenerator::GenerateUUIDInPlace(nsID* aId) {
// Fill the nsID with 16 random bytes.
uint64_t randomNumbers[2]{RandomUint64OrDie(), RandomUint64OrDie()};
static_assert(sizeof(nsID) == sizeof(randomNumbers),
"nsID must be exactly 16 bytes");
memcpy(aId, &randomNumbers, sizeof(nsID));
// Put in the version
aId->m2 &= 0x0fff;
aId->m2 |= 0x4000;
// Put in the variant
aId->m3[0] &= 0x3f;
aId->m3[0] |= 0x80;
return NS_OK;
} Note the " // Put in the version" and "// Put in the variant", which again goes to my point about labelling the parts as variables. My point being, there are multiple ways of doing this that end up with something pretty clear. |
I raised my concerns with using UUID "semantics" (i.e. time-{low,mid,high} etc.) in this comment and I still think it doesn't make sense to keep carrying around this historic notation for anything describing non-v1-UUIDs. In my opinion it's an unfortunate historic artifact that RFC4122 continued to use these semantics when introducing random (v4) and name-based (v3/v5) UUIDs. |
I think it’s fine if we drop the actual semantic, but having a cleaner “Let |part1| … Let |part2|…, Let |part3|, let |part4|“ (or similar would) would really help with legibility. See the current pull request. It’s a good improvement IMO. |
@marcoscaceres @ctavan @domenic I'm okay of trying to trim down the verbosity of the algorithm in spec text, but I don't think we've quite landed on format that pleases everyone. I'd like to close #33 for now, to unblock #34, and revisit this issue later. Sound reasonable? |
Sounds ok to me... but yeah, let's please come back to it. The WebCrypto spec has a bad rep for being a bit developer unfriendly and unapproachable (it's a difficult topic! not just the spec's fault) - so, speaking as a developer, someone on a devrel team, and an implementer, I think little editorial things like this really do help. |
@ctavan any interest on giving this rewrite a go, you're the subject matter expert at real world implementations of the algorithm 😄 |
@twiss are you happy with where we landed on, with the algorithm spec text? I'd like to close this issue if we're in a good place. |
I'm personally not opposed to the current text. It would be nice to normatively refer to an RFC, but RFC4122 is quite difficult to follow since it also defines other versions, not just v4. If it makes people happier, I think one alternative would be to move the reference to RFC4122 out of the note, and move the current algorithm into a note (i.e. swap which one is normative). That being said, I'm not sure if that would make it easier to read, so I'm also fine with leaving it as is. For what it's worth, the Web Crypto spec says that
so once it's merged there, an implementation would also be allowed to generate a v4 UUID in any other way, without necessarily having to follow the exact steps written here. |
👍 I'm going to go ahead and close this ticket then, we can iterate in the upstream spec in the future if we like.
Thanks for pointing out this caveat. In practice it seems like most folks implement UUID v4 with various optimizations that would look pretty ugly in spec text (buffers of random bytes, bit shifting tricks). |
For "Generating a random UUID", I wonder if we can rewrite the section to either:
drop the section entirely, just hook directly into RFC4122 (ideally). If things need to be fixed or better worded, we should update RFC4122 instead. This would be my preference, as I'm concerned about duplication with RFC4122.
Or, rewrite the algorithm such that it's more like this:
The text was updated successfully, but these errors were encountered: