-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate sizes, alignment and offsets for both wasm32 and wasm64 #1711
Conversation
This is a generalization of #1446 |
Thanks for working on this! I've gotten a chance to review this now (thanks for the patience). I don't really understand how this works at a low-level and it feels like this form of generically handling size/align for 32/64 seems somewhat error-prone. It does seem like a nice property though to have where it would make code generation easier in the wit-bindgen and source languages. To help evaluate this I whipped up a small differential fuzzer which generates arbitrary WITs and then asserts that the sizes reported by this branch are the same as the sizes reported on the Once that's sorted out I'd also like to extend the fuzzer a bit to call For running the fuzzer I was locally doing:
|
Thank you for taking the time to fuzz it. I wouldn't fully trust code of this complexity either. I found the difference: Resources are considered very large, but the new code considers them (very large)*2. Honestly, I had no idea about the correct way to handle this type, but perhaps it should use add_for_64bit: 0. The code giving the problem:
|
I think resources should be unconditionally 32-bit values (always 4 bytes) regardless of wasm32 vs wasm64? |
I agree that resource handles should be 4+0, but this code calculates the size of a resource itself, which doesn't make much sense, see here https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wit-parser/src/sizealign.rs#L61 I was simply at loss on which number to return in this unrealistic case. |
Aha I see! That looks like it's just something to update in the fuzzer then and skip |
I planned to use intmax+0 which would make the old and new values identical again, then add the field and payload offsets, hiding size calculation of resources would be unnecessary with the first change. PS: I will for sure take inspiration from your fuzzer in testing other projects. Now I know the real purpose of wit-smith ;-) |
Ah it's ok to do whatever for |
Oh, after some time the fuzzer found another one:
|
See https://github.com/cpetig/wasm-tools/tree/fuzz64 for an updated variant of the fuzzer checking field offsets and payload offset. I assume that the types contained within other types are already part of the all types list, so no need to check the individual fields. Now I will debug the failed case, it found another one
|
I found the problem (in my 64bit combination reasoning), pushed and fuzzer running again |
I stopped the fuzzer after several hours (3.4M tests, if I read correctly) without any difference detected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks for fuzzing! That makes me more confident in this approach so I've done a bit of a deeper dive into this.
Do you think it would be possible to preserve the fuzzer? Basically having a separate "relatively simple" algorithm for calculating sizes with no frills and ensuring that this result continues to be the same?
crates/wit-parser/src/sizealign.rs
Outdated
/// Architecture specific measurement of position, | ||
/// the combined amount in bytes is | ||
/// `bytes + if 4 < core::mem::size_of::<*const u8>() { add_for_64bit } else { 0 }` | ||
#[derive(Default, Clone, Copy, Eq, PartialEq, Debug)] | ||
pub struct ArchitectureSize { | ||
/// exact value for 32-bit pointers | ||
pub bytes: usize, | ||
/// amount of bytes to add for 64-bit architecture | ||
pub add_for_64bit: usize, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to represent this as something like bytes
and pointers
? That way the size could be bytes + pointers * sizeof(pointer)
. That I think may also make methods like pointers_to_add
a bit more clear how to be implemented and/or unnecessary since it's just a field then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this once, encountered some difficulties and changed it to the current form. I will try again … I guess the current representation avoids multiplications and divisions and makes alignment more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it should be possible to skip mul/div by only having shifts left-and-right by storing the log2 of the pointer size (or something like that), but I'm mostly curious about this from an understandability point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, while implementing a more obscure test case I realized that I made a conceptual oversight:
I you take the max of 10 bytes
and 2 pointers
you get 4 bytes + 1.5 pointers
- and I didn't take fractional pointer amounts into account, yet. It still can work if I multiply the number of pointers by 4 and handle fractional amounts properly. This wasn't visible in the fuzzing because of the additional alignment applied to the result before outputting the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this example resulted in 10 bytes + 6 bytes_on_64_bit
- if you start with 9 bytes you even get a quarter pointer amount (9 + 7) and (12+4) after alignment to pointer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably you don't want to express/represent partial pointer sizes for objects containing pointers, so you would want to align in the max function and return (8+1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to make it work, but align is a bit more complex than I would like it to be and max now uses the easy way out (calculate 32bit and 64bit then max and then separate again). Max can probably be made more efficient by expanding the 32bit/64bit calculation and then simplifying terms, but as it is mostly used for variants it should be less performance sensitive.
Fuzzer ran again without finding issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing on this! I think at least from a code generator perspective it'll be easier to reason about size + pointers so I'm glad this was able to work out.
I think the fuzzer could simply live in the new code and compare the new version with an older released version (flipping the crate alias roles). |
Oh excellent idea on fuzzing, I like that! |
I feel we might want to replace Update: Already included in the latest patchset. |
@alexcrichton Do you want me to mark these threads as resolved or do you want to check first and mark then? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks again for working on this! I'll admit I still don't really understand how this works but I'm willing to trust the combo of you and the fuzzer, so let's merge and see how it goes.
Oh and either way works for me on threads, thanks! |
This notably updates to handle bytecodealliance/wasm-tools#1711 but this is done in a way that is not intended to replace bytecodealliance#1035 but instead makes the update "as easy as possible" by just adding calls to `.{size,align}_wasm32()` where needed. Full support is expected to land in bytecodealliance#1035.
This notably updates to handle bytecodealliance/wasm-tools#1711 but this is done in a way that is not intended to replace bytecodealliance#1035 but instead makes the update "as easy as possible" by just adding calls to `.{size,align}_wasm32()` where needed. Full support is expected to land in bytecodealliance#1035.
This notably updates to handle bytecodealliance/wasm-tools#1711 but this is done in a way that is not intended to replace bytecodealliance#1035 but instead makes the update "as easy as possible" by just adding calls to `.{size,align}_wasm32()` where needed. Full support is expected to land in bytecodealliance#1035.
* Update wasm-tools dependencies This notably updates to handle bytecodealliance/wasm-tools#1711 but this is done in a way that is not intended to replace #1035 but instead makes the update "as easy as possible" by just adding calls to `.{size,align}_wasm32()` where needed. Full support is expected to land in #1035. * More sizing updates
This enables wit-bindgen to emit code like
let l2 = *ptr0.add(core::mem::size_of::<*const u8>()).cast::<usize>();
which automatically calculates the right offset of 4 for wasm32 and 8 for wasm64 e.g. for string or list lowering.The more complex cases combine fixed byte offsets and pointer sized offsets.
This also adds a compatibility layer which falls back to 32bit for the classicalSizeAlign
usage.