Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memchr implementation has UB #141

Closed
carbotaniuman opened this issue Oct 7, 2022 · 3 comments · Fixed by #144
Closed

The memchr implementation has UB #141

carbotaniuman opened this issue Oct 7, 2022 · 3 comments · Fixed by #144

Comments

@carbotaniuman
Copy link
Contributor

carbotaniuman commented Oct 7, 2022

https://stackoverflow.com/questions/47315902/is-it-legal-to-call-memchr-with-a-too-long-length-if-you-know-the-character-wil

Basically a memchr call with too big a size is valid if you can guarantee that it will not be reached. Given mustang is not yet intended to be super performant, we can probably just replace this with a simple loop

Opening for storage after a discussion on the Rust community discord.

@sunfishcode
Copy link
Owner

Good spot! It's unfortunate, because the memchr crate has some nice optimizations, but it appears we'll have to stop using it.

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 7, 2022

Does the memchr crate speculatively read across pages? If not, ensuring that the memchr C function can't ever be inlined should be enough to prevent the compiler from exploiting UB even if it still technically exists.

@sunfishcode
Copy link
Owner

At a brief glance, I don't see any obvious unaligned loads in memchr's optimized routines, so it would probably work in practice.

On the other hand, the public documentation of the memchr crate doesn't guarantee that it doesn't do these things. Also, inline(never) is not a complete optimization barrier. Also, tools like miri can see these kinds of problems even when optimizers can't.

I don't yet know who will want to use mustang or c-scape, or what they'll want to use it for. In the fullness of time, we could implement the optimizations that the memchr crate does in a way that respect the C memchr semantics. But until then, will optimization be more important to them than theoretical correctness? I really don't know. But my guess is theoretical correctness.

That said, I am aware of an outstanding theoretical correctness bug in c-scape: #142. I submitted #143 which fixes the easy cases, and am thinking about having c-scape use a stack-allocated buffer in read so that it can write into the user's buffer with copy_nonoverlapping in order to avoid the UB. That's a little heavy-handed; bytecodealliance/rustix#81 tracks fixing this properly, which might depend on how rust-lang/rust#78485 turns out.

sunfishcode added a commit that referenced this issue Oct 7, 2022
sunfishcode added a commit that referenced this issue Nov 8, 2022
* Don't use the memchr crate to implement `memchr`.

Fixes #141.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants