Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate sizes, alignment and offsets for both wasm32 and wasm64 #1711

Merged
merged 22 commits into from
Aug 19, 2024

Conversation

cpetig
Copy link
Contributor

@cpetig cpetig commented Aug 3, 2024

This enables wit-bindgen to emit code like let l2 = *ptr0.add(core::mem::size_of::<*const u8>()).cast::<usize>(); which automatically calculates the right offset of 4 for wasm32 and 8 for wasm64 e.g. for string or list lowering.

The more complex cases combine fixed byte offsets and pointer sized offsets.

This also adds a compatibility layer which falls back to 32bit for the classical SizeAlign usage.

@cpetig
Copy link
Contributor Author

cpetig commented Aug 3, 2024

This is a generalization of #1446

@alexcrichton
Copy link
Member

Thanks for working on this! I've gotten a chance to review this now (thanks for the patience). I don't really understand how this works at a low-level and it feels like this form of generically handling size/align for 32/64 seems somewhat error-prone. It does seem like a nice property though to have where it would make code generation easier in the wit-bindgen and source languages.

To help evaluate this I whipped up a small differential fuzzer which generates arbitrary WITs and then asserts that the sizes reported by this branch are the same as the sizes reported on the main branch. That currently fails relatively quickly where ArchitectureSize looks like it's holding negative values as unsigned numbers which means that overflow is panicking on addition. Is that intended? If not that might be a bug?

Once that's sorted out I'd also like to extend the fuzzer a bit to call field_offsets and payload_offset for types (if it matches) to assert that those return values are the same as well.

For running the fuzzer I was locally doing:

FUZZER=wit64 cargo +nightly fuzz run -s none run

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

Thank you for taking the time to fuzz it. I wouldn't fully trust code of this complexity either.

I found the difference: Resources are considered very large, but the new code considers them (very large)*2. Honestly, I had no idea about the correct way to handle this type, but perhaps it should use add_for_64bit: 0.

The code giving the problem:

TypeDefKind::Resource => ElementInfo::new(
                ArchitectureSize::new(usize::MAX, usize::MAX),
                Alignment::Bytes(NonZeroUsize::new(usize::MAX).unwrap()),
            ),

@alexcrichton
Copy link
Member

I think resources should be unconditionally 32-bit values (always 4 bytes) regardless of wasm32 vs wasm64?

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

I agree that resource handles should be 4+0, but this code calculates the size of a resource itself, which doesn't make much sense, see here https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wit-parser/src/sizealign.rs#L61

I was simply at loss on which number to return in this unrealistic case.

@alexcrichton
Copy link
Member

Aha I see! That looks like it's just something to update in the fuzzer then and skip TypeDef::Resource declarations within interfaces (makes sense in retrospect). Would you be up for developing the fuzzer a bit further to see if this PR's approach is viable? If the differential fuzzer is clean that would make me more confident in this approach personally and I can do a deeper review.

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

I planned to use intmax+0 which would make the old and new values identical again, then add the field and payload offsets, hiding size calculation of resources would be unnecessary with the first change.

PS: I will for sure take inspiration from your fuzzer in testing other projects. Now I know the real purpose of wit-smith ;-)

@alexcrichton
Copy link
Member

Ah it's ok to do whatever for TypeDefKind::Resource, it's an edge case needed for completeness but shouldn't show up practically in theory. This is just a fuzzer issue really and I think the way you map it right now is fine.

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

Oh, after some time the fuzzer found another one: fuzz/artifacts/run/crash-84dc68e18872aad7a98cfcf676848973ef8aa1af

name: func(name: result<list<u16>, result<result<bool, result<result, result<s16>>>>>) -> result;

fuzz/src/wit64.rs:45:9:
assertion `left == right` failed
  left: 24
 right: 30

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

See https://github.com/cpetig/wasm-tools/tree/fuzz64 for an updated variant of the fuzzer checking field offsets and payload offset. I assume that the types contained within other types are already part of the all types list, so no need to check the individual fields.

Now I will debug the failed case, it found another one fuzz/artifacts/run/crash-e4149b42d86d41bf7a4230735e54a801f5be70ef

xx: func(name: option<option<option<option<option<option<result<result<result<list<s32>, bool>, result<result<result<result<list<s32>, option<option<option<option<option<option<option<option<option<bool>>>>>>>>>>, bool>, bool>, bool>>, bool>>>>>>>, name1: bool, name2: bool, name3: bool, name4: bool);

@cpetig
Copy link
Contributor Author

cpetig commented Aug 13, 2024

I found the problem (in my 64bit combination reasoning), pushed and fuzzer running again

@cpetig
Copy link
Contributor Author

cpetig commented Aug 14, 2024

I stopped the fuzzer after several hours (3.4M tests, if I read correctly) without any difference detected.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for fuzzing! That makes me more confident in this approach so I've done a bit of a deeper dive into this.

Do you think it would be possible to preserve the fuzzer? Basically having a separate "relatively simple" algorithm for calculating sizes with no frills and ensuring that this result continues to be the same?

Comment on lines 70 to 79
/// Architecture specific measurement of position,
/// the combined amount in bytes is
/// `bytes + if 4 < core::mem::size_of::<*const u8>() { add_for_64bit } else { 0 }`
#[derive(Default, Clone, Copy, Eq, PartialEq, Debug)]
pub struct ArchitectureSize {
/// exact value for 32-bit pointers
pub bytes: usize,
/// amount of bytes to add for 64-bit architecture
pub add_for_64bit: usize,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to represent this as something like bytes and pointers? That way the size could be bytes + pointers * sizeof(pointer). That I think may also make methods like pointers_to_add a bit more clear how to be implemented and/or unnecessary since it's just a field then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this once, encountered some difficulties and changed it to the current form. I will try again … I guess the current representation avoids multiplications and divisions and makes alignment more straightforward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory it should be possible to skip mul/div by only having shifts left-and-right by storing the log2 of the pointer size (or something like that), but I'm mostly curious about this from an understandability point of view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, while implementing a more obscure test case I realized that I made a conceptual oversight:

I you take the max of 10 bytes and 2 pointers you get 4 bytes + 1.5 pointers - and I didn't take fractional pointer amounts into account, yet. It still can work if I multiply the number of pointers by 4 and handle fractional amounts properly. This wasn't visible in the fuzzing because of the additional alignment applied to the result before outputting the size.

Copy link
Contributor Author

@cpetig cpetig Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this example resulted in 10 bytes + 6 bytes_on_64_bit - if you start with 9 bytes you even get a quarter pointer amount (9 + 7) and (12+4) after alignment to pointer size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you don't want to express/represent partial pointer sizes for objects containing pointers, so you would want to align in the max function and return (8+1).

Copy link
Contributor Author

@cpetig cpetig Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to make it work, but align is a bit more complex than I would like it to be and max now uses the easy way out (calculate 32bit and 64bit then max and then separate again). Max can probably be made more efficient by expanding the 32bit/64bit calculation and then simplifying terms, but as it is mostly used for variants it should be less performance sensitive.

Fuzzer ran again without finding issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing on this! I think at least from a code generator perspective it'll be easier to reason about size + pointers so I'm glad this was able to work out.

@cpetig
Copy link
Contributor Author

cpetig commented Aug 15, 2024

Do you think it would be possible to preserve the fuzzer? Basically having a separate "relatively simple" algorithm for calculating sizes with no frills and ensuring that this result continues to be the same?

I think the fuzzer could simply live in the new code and compare the new version with an older released version (flipping the crate alias roles).

@alexcrichton
Copy link
Member

Oh excellent idea on fuzzing, I like that!

@cpetig
Copy link
Contributor Author

cpetig commented Aug 17, 2024

Oh excellent idea on fuzzing, I like that!

I feel we might want to replace wit-smith in the fuzzer with the old version soon ... agreed?

Update: Already included in the latest patchset.

@cpetig
Copy link
Contributor Author

cpetig commented Aug 18, 2024

@alexcrichton Do you want me to mark these threads as resolved or do you want to check first and mark then?

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thanks again for working on this! I'll admit I still don't really understand how this works but I'm willing to trust the combo of you and the fuzzer, so let's merge and see how it goes.

@alexcrichton alexcrichton added this pull request to the merge queue Aug 19, 2024
@alexcrichton
Copy link
Member

Oh and either way works for me on threads, thanks!

Merged via the queue into bytecodealliance:main with commit 5068b3a Aug 19, 2024
28 checks passed
alexcrichton added a commit to alexcrichton/witx-bindgen that referenced this pull request Aug 22, 2024
This notably updates to handle bytecodealliance/wasm-tools#1711 but this
is done in a way that is not intended to replace bytecodealliance#1035 but instead makes
the update "as easy as possible" by just adding calls to
`.{size,align}_wasm32()` where needed. Full support is expected to land
in bytecodealliance#1035.
alexcrichton added a commit to alexcrichton/witx-bindgen that referenced this pull request Aug 22, 2024
This notably updates to handle bytecodealliance/wasm-tools#1711 but this
is done in a way that is not intended to replace bytecodealliance#1035 but instead makes
the update "as easy as possible" by just adding calls to
`.{size,align}_wasm32()` where needed. Full support is expected to land
in bytecodealliance#1035.
alexcrichton added a commit to alexcrichton/witx-bindgen that referenced this pull request Aug 22, 2024
This notably updates to handle bytecodealliance/wasm-tools#1711 but this
is done in a way that is not intended to replace bytecodealliance#1035 but instead makes
the update "as easy as possible" by just adding calls to
`.{size,align}_wasm32()` where needed. Full support is expected to land
in bytecodealliance#1035.
github-merge-queue bot pushed a commit to bytecodealliance/wit-bindgen that referenced this pull request Aug 23, 2024
* Update wasm-tools dependencies

This notably updates to handle bytecodealliance/wasm-tools#1711 but this
is done in a way that is not intended to replace #1035 but instead makes
the update "as easy as possible" by just adding calls to
`.{size,align}_wasm32()` where needed. Full support is expected to land
in #1035.

* More sizing updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants