Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bool == _Bool ? #53

Closed
gnzlbg opened this issue Dec 6, 2018 · 26 comments
Closed

bool == _Bool ? #53

gnzlbg opened this issue Dec 6, 2018 · 26 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Dec 6, 2018

The T-compiler and T-lang teams signed off here, that

bool has the same representation as _Bool

where on every platform that Rust currently supports this implies that:

  • bool has a size and an alignment of 1,
  • true as i32 == 1 and false as i32 == 0, EDIT: that is always true, the valid bit patterns of bool is what matters here, e.g., on a platform where bool has the same size as i8, whether transmute::<_, i8>(true) == 1 and transmute::<_, i8>(false) == 0

These two properties are not guaranteed by Rust, and unsafe code cannot rely on these. In the last UCG WG meeting it was unclear whether we want to guarantee these two properties or not. As @rkruppe pointed out, this would be guaranteeing something different and incompatible with what T-lang and T-compiler guaranteed.

Note: any change that the UCG WG proposes will have to go through the RFC process anyways, were it might be rejected. This issue is being raised with stakeholders to evaluate whether there is something that needs changing or not, and if so, whether the change is possible, has chances to achieve consensus, etc.

The following arguments have been raised (hope I did not miss any):

  • T-lang and T-compiler did not specify which version of the C standard _Bool conforms to. In C++20 and C20, P0907r4 (C++) and N2218 (C) specify that:

    • bool and _Bool contain no padding bits (only value bits),
    • 1 == (int)true and 0 == (int)false.

    In some of the merged PRs of the UCG we have already specified that the platform's C implementation needs to comply with some, e.g., C17 or "latest" C standard properties (e.g. for repr(C) struct layout). If we end up requiring C20 for representation / validity of repr(C), we end up guaranteeing these properties. AFAICT the only property about bool that would remain as implementation-defined is its size and alignment.

  • In Representation of bool, integers and floating points #9 / added floating point/int summary #49 , we ended up requiring that C's CHAR_BITS == 8. This implies that if CHAR_BITS != 8 then bool cannot have the same representation as _Bool. Some stakeholders still wanted to be able to do C FFI with these platforms, e.g. @briansmith suggested that Rust should diagnose, e.g., using bool on FFI on these platforms (Representation of bool, integers and floating points #9 (comment)), but that interfacing with those platforms via C FFI (e.g. against assembly) should still be possible (e.g. in a DSP where CHAR_BITS == 16 passing a u16 to C / assembly / ... expecting a char or 16 bit integer should be doable).

  • What exactly T-lang and T-compiler actually ended up guaranteeing isn't 100% clear. In Lint for the reserved ABI of bool rust#46176 the decision seems to be that bool == _Bool, but the PR that was actually merged only mentions that bool has a size of 1: Document the size of bool rust#46156. This might be an oversight in the docs, and some have mention that the reference is "non-normative". @briansmith pointed out (here and here) that bool ABI (e.g. integer class or something else?), alignment, bit patterns denoting true and false, etc. don't appear to be properly documented. @gankro summarized the status quo in Rust Layout and ABIs document and mentioned that projects like Firefox rely on these extra guarantees for correctness (e.g. for bindgen, etc. to work properly, see here.

There are a couple of comments by @withoutboats that I think show both T-lang and T-compiler's rationale and the spirit behind their decision, here:

  • I worry that if we don't specify bool as equivalent to C and C++ booleans, people will need to use c_bool in FFI to be cross platform compatible.

  • I worry that if we don't specify bool as byte sized, people will create a struct Bool(u8) to get that guarantee & keep their structs small.

and here:

People could come to the conclusion that they need a c_bool type for their FFI to be forward compatible with platforms we don't yet support. I think defining it as the same representation as _Bool / C++ bool makes it the least likely someone does something painful to avoid entirely hypothetical problems.

So even if the docs say that bool has a size of 1, and that's it, I believe that this last comment shows that the spirit of T-lang and T-compiler decision was to spare people from creating a c_bool type to be forward compatible on C FFI with platforms that we might never properly support.


I think that the open questions that have to be clarified, are:

  • Should we require C20 compatibility for _Bool, or do we want to stay backwards compatible with C99/11/17 ? (in this case, people can only rely on, e.g., true == 1 on targets where the platform implementation is C20 "conforming" at least w.r.t. _Bool)
  • Do we want to require that bool has a size and an alignment of 1 ? (in the hypothetical case that we ever support a platform where this is not the case, we could raise an improper_ctype warning / error on the platform, or some other form of diagnostic, as @briansmith suggested). This would be a change incompatible withbool == _Bool, might lead people to create and use a c_bool type, etc.

cc @rkruppe @withoutboats @briansmith @gankro @joshtriplett @cuviper @whitequark @est31 @SimonSapin

@briansmith
Copy link

  • n C++20 and C20, P0907r4 (C++) and N2218 (C) specify that:

Those are stated to be "proposals". I believe the C++ proposal might have been accepted by the C++ committee, but I don't know if the C committee accepted the proposal yet.

@strega-nil
Copy link

as will always make true as INTEGRAL_TYPE == 1, to be clear, no matter what the representation of true is (and the same for false)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 7, 2018

@ubsan yes, I should have used transmute, updated.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 7, 2018

Those are stated to be "proposals". I believe the C++ proposal might have been accepted by the C++ committee, but I don't know if the C committee accepted the proposal yet.

The author of those proposals tweeted (really, this is the only source I have) about this proposal: https://twitter.com/jfbastien/status/989242576598327296?lang=en

Not only is the C++ standards committee happy to make signed integers two's complement in C++20 with http://wg21.link/p0907 , the C standards committee voted to do the same thing for the next version of C! 🤯 I just have to write the wording. 🎉

I don't know which parts of p0907 and N2218 will make it into C and C++, but at least some of it is on track for C++20 and the next revision of the C standard.

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

What is the problem with just saying "bool has the same size as C's _Bool, here is what this means for popular platforms"? "bool == _Bool" and "bool has size 1" are only contradicting if they are both interpreted as normative for all platforms; I do not think the latter is normative.

TBH I am a bit surprised to see so much fuzz about this, I feel there is an aspect to this discussion that I am missing.

The one possible problem with this definition is that it means we can only ever port Rust to platforms that support C; however, I think that is an entirely theoretical concern at this point.

@Gankra
Copy link

Gankra commented Dec 7, 2018

Defining bool == _Bool has no implications on c-less platforms, as it is implicitly "if we have a C platform to interoperate with".

I don't have much respect for the boogeyman of weird bools, since aiui that's mostly CHAR_BIT != 8 or wildly ancient platforms (win 95). We should blindly assert bool has the proposed c/cpp-20 definition, and also that we only interoperate with c platforms that support it.

I might be willing to accept a weaker statement of "we don't interoperate with c's bool on such a weird platform" but that pushes people back to spreading the folklore that c_bool is useful and good.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 7, 2018

What is the problem with just saying "bool has the same size as C's _Bool, here is what this means for popular platforms"?

All options have problems. The problem with that approach is that it doesn't guarantee that Rust bool is 1 byte, so those who might care about that might be left wondering: "Should I use u8 to store my booleans instead of bool just in case someone uses my code in some hypothetical platform in which size_of::<bool>() != 1 ? " Note that this something that affects all Rust code, instead of just Rust's C FFI.

If we would say that size_of::<bool>() == 1 then some people might decide to use c_bool in C FFI to be portable with hypothetical C platforms in which that does not hold. That looks like a lesser evil to me than leaving all Rust users, including those writing Rust code that does not interface with C, wondering about the size of bool.

@SimonSapin
Copy link

just in case someone uses my code in some hypothetical platform in which size_of::<bool>() != 1

fn _static_assert_size_of() {
    let _ = std::mem::transmute<bool, u8>;
}

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 7, 2018

@SimonSapin preventing code from compiling is not the same producing portable code that compiles and works as expected.

An equivalent solution (that prevents some code from compiling) would be to not allow bool in C FFI in those platforms in which bool != _Bool. That would only affect Rust code doing C FFI, instead of affecting all Rust code, and bool would still continue to work properly on C FFI as long as the code only targets platforms in which bool == _Bool.

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

The problem with that approach is that it doesn't guarantee that Rust bool is 1 byte,

The lang and compiler teams decided that this is not the case (i.e., they decided that bool might have size different from 1)! It is not our place to change that decision. We are mostly documenting behavior here.

This was actually called out in rust-lang/rust#46176 (comment):

We document this, and also document that on every platform we currently support, this means that the size of bool is 1.

(emphasis mine)

I see no reason for us to overthrow their decision as part of this documentation process. If you want to change that decision, feel free to write an RFC, but that IMO has no place in the current UCG discussion phase.

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

@SimonSapin Here is a version of that code that actually compiles: (from your very brief message, I had no idea whether you wanted to express that this is accepted or rejected by the compiler^^)

fn static_assert_size_of() { unsafe {
    let _ = std::mem::transmute::<bool, u8>(true);
} }

Looks like a bug to me. We also don't allow transmuting between u64 and usize on 64bit platforms. So we should reject this transmute for the same reason, or else decide that we will never support a platform where _Bool has a size different from 1, or else backtrack on rust-lang/rust#46176 (comment). "We", however, is the community as part of an RFC, or maybe T-lang + T-compiler, but not the UCG group.

@hanna-kruppe
Copy link

We also don't allow transmuting between u64 and usize on 64bit platforms.

Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg-gate the transmute.

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

Huh? Of course we do. It compiles, and it's perfectly legitimate if you only target 64 bit platforms or properly cfg-gate the transmute.

You are right. Seems I misremembered. Even better, things are consistent then.

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2018

A long discussion with @gnzlbg on Zulip uncovered why I am so confused here (in particular, I maintain that it makes no logical sense to bring up platforms with CHAR_BITS != 8 in this discussion if we declared such platforms unsupported): @gnzlbg thinks that even if we say "for FFI we assume that CHAR_SIZE == 8", we still want to say some things about FFI on platforms where CHAR_SIZE != 8.

I think we should close this issue, and just document the T-lang + T-compiler decision that bool shares size and alignment with _Bool. And then some people that care should have a discussion about which platforms we support, where "no support" means "no support". And then, if there is agreement that it is worth supporting CHAR_BITS != 8 at least a bit, we can see how that affects bool.

@SimonSapin
Copy link

(Code in my previous comment did not compile because I forgot :: in the turbofish syntax. But you don’t need to call transmute to get the magic size equality check, only to "instantiate" it with concrete input and output types.)

@Gankra
Copy link

Gankra commented Dec 7, 2018

I am still interested in knowing what concrete platforms people want to actually support with more flexible bools.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 7, 2018

I am still interested in concrete platforms that people want to actually support with more flexible bools.

Nobody came up with any relevant ones in the last discussions - I mentioned that MSVC <= 4.2 had sizeof(bool) == 4 but that is not a platform that I'd want to support.

@cuviper
Copy link
Member

cuviper commented Dec 7, 2018

Historically, 32-bit powerpc-darwin was another one with sizeof(bool) == 4.

@joshtriplett
Copy link
Member

Cray didn't have byte-level addressing, but I don't think anyone needs to target the Cray architecture anymore.

I think we should mandate that bool is always the platform's _Bool. And to the extent we need to, let's always assume 8-bit bytes, rather than spending time and resources future-proofing against a platform that doesn't yet exist.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 8, 2018

I think we should mandate that bool is always the platform's _Bool.

What does that buy us?

Pros of bool == _Bool:

  • People don't have to use c_bool on C FFI out of compatibility concerns with weird targets

Cons of bool == _Bool:

  • unsafe code cannot rely on size_of::<bool>() == 1 if it wants to be "truly portable"
  • unsafe code might not be able to rely on transmute::<_, i32>(true) == 1 and transmute::<_, i32>(false) == 0 if it wants to be truly portable (portable to C99 platforms for example)
  • People might write struct Bool(u8) for use in Rust out of portability paranoia

On one hand, I think that we should not make it impossible to write Rust that targets weird C platforms - there should not be room or excuses for people to use C. On the other hand, targeting weird C platforms is going to be hard anyways, most of the ecosystem might not be reused at least "as is" on those, etc. Is it worth it to leave most Rust users wondering about bool to make targeting these weird C platforms a slightly bit easier? I don't know. Can, e.g., -sys Rust C FFI wrappers be reused at all on those weird targets ? I don't know either.

As @rkruppe argued, just because we leave those things as "implementation defined" does not mean that people won't assume that they hold everywhere if that holds on all platforms they care about. I don't care about platforms where _Bool doesn't have the properties above, so I just can assume that they hold. The bool == _Bool definition allows that, so as long as we word everything correctly, it's a good choice IMO.

@Ixrec
Copy link
Contributor

Ixrec commented Dec 9, 2018

To clarify, c_bool would become a legitimately useful type if we ever did add support for a "weird C platform"/platform with large bools, right?

So as long as there's a fundamental conflict between guaranteeing the size on weird future platforms and guaranteeing C interop on weird future platforms, and "targeting weird C platforms is going to be hard anyways", it seems we gain a lot more from making bool predictably small for all the ordinary pure-Rust use cases and then keeping in mind that we'll have to introduce an official c_bool type if we ever add a platform where it would be useful (and we probably shouldn't do that before the "portability lint" becomes a real thing).

@strega-nil
Copy link

@Ixrec the team has decided that it's better to be compatible with C, than predictably 1 byte, with bit patterns exactly 0b0 and 0b1 (although there's a close to zero percent chance that this decision ever gets tested)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 9, 2018 via email

@Gankra
Copy link

Gankra commented Dec 9, 2018

regardless of our decision, bool's sizing is no different from usize vs u32/u64, which I don't believe is considered an "interesting" problem in miri

@RalfJung
Copy link
Member

RalfJung commented Dec 9, 2018

What should miri do here? It currently implements an abstract machine where bool is one Byte wide

No, it implements an abstract machine where bool is as large as it is on the target platform. Sizes and layout of types in miri is entirely determined by rustc, using the same code as what is used for codegen.

and 0x1 and 0x0 are used for true and false

That's correct. But it seems that'll soon be fixed in C/C++ as well so it seems fine.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 13, 2018

We discussed this briefly in today's meeting, so I think we can close this. We agreed on writing down something along these lines (the exact wording will be in a PR, we can discuss the nits there - cc @avadacatavra ):


Representation invariant of bool

Rust's bool has the same layout as C17's _Bool, that is, its size and alignment are implementation-defined.

note: on all platforms that Rust's currently supports, the size and alignment of bool are 1, and its ABI class is INTEGER.


Validity invariant of bool

Rust's bool has the same valid representations as C17's _Bool, that is, two valid implementation-defined bit-patterns corresponding to true and false- all other bit-patterns are invalid.

note: on all platforms that Rust's currently supports, 0x0 is the bit-pattern of false, and 0x1 is the bit-pattern of true - all other bit-patterns are invalid.

note: there are two proposals, N2218 for C and P0907 for C++, which propose defining0x0 as the bit-pattern for false and 0x1 as the bit-pattern for true in the C and C++ standards.


We didn't say anything about the safety invariant of bool, but I don't think there is much to say about that yet. It was also suggested that we should warn on unsafe code that makes assumptions about, at least, the size and alignment of bool. We might need to add a note about this, or maybe open a rust-lang/rust issue about doing that ?


There was interest in documenting the "controversy" here but there was some agreement that there isn't really much controversy about this, just different goals that are at tension.

EDIT: briefly about the goals at tension: the unsafe code guidelines answer the question "What is unsafe code allowed to assume about bool? ". From the point-of-view of answering this question, the goal would be a simple and clear answer, like: bool has size 1, 0x1 is true and 0x0 is false. Otherwise, unsafe code that assumes any of this is "wrong" in some sense, and people might end up writing more complicated unsafe code to avoid relying on these assumptions.

The T-lang and T-compiler team considered many goals, like "people should use bool in C FFI instead of c_bool", "people should use bool in Rust instead of u8", etc. The bool == _Bool decision balances all these goals. In practice, for users that write code (safe or unsafe) that aims to target "mainstream" platforms, the distinction does not really matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants