-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 2195] Document new type representations #246
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -149,7 +149,8 @@ layout such as reinterpreting values as a different type. | |
Because of this dual purpose, it is possible to create types that are not useful | ||
for interfacing with the C programming language. | ||
|
||
This representation can be applied to structs, unions, and enums. | ||
This representation can be applied to structs, unions, and enums. The exception | ||
is [zero-variant enumerations] for which the `C` representation is an error. | ||
|
||
#### \#[repr(C)] Structs | ||
|
||
|
@@ -222,48 +223,178 @@ assert_eq!(std::mem::size_of::<SizeRoundedUp>(), 8); // Size of 6 from b, | |
assert_eq!(std::mem::align_of::<SizeRoundedUp>(), 4); // From a | ||
``` | ||
|
||
#### \#[repr(C)] Enums | ||
#### \#[repr(C)] Field-less Enums | ||
|
||
For [C-like enumerations], the `C` representation has the size and alignment of | ||
For [field-less enums], the `C` representation has the size and alignment of | ||
the default `enum` size and alignment for the target platform's C ABI. | ||
|
||
> Note: The enum representation in C is implementation defined, so this is | ||
> really a "best guess". In particular, this may be incorrect when the C code | ||
> of interest is compiled with certain flags. | ||
|
||
> Warning: There are crucial differences between an `enum` in the C language and | ||
> Rust's C-like enumerations with this representation. An `enum` in C is | ||
> Rust's field-less enumerations with this representation. An `enum` in C is | ||
> mostly a `typedef` plus some named constants; in other words, an object of an | ||
> `enum` type can hold any integer value. For example, this is often used for | ||
> bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold | ||
> the discrimnant values, everything else is undefined behaviour. Therefore, | ||
> using a C-like enumeration in FFI to model a C `enum` is often wrong. | ||
> bitflags in `C`. In contrast, Rust’s field-less enums can only legally hold | ||
> the discrimnant values, everything else is [undefined behavior]. Therefore, | ||
> using a field-less enum in FFI to model a C `enum` is often wrong. | ||
|
||
It is an error for [zero-variant enumerations] to have the `C` representation. | ||
#### \#[repr(C)] Enums With Fields | ||
|
||
For all other enumerations, the layout is unspecified. | ||
For enums with fields, the `C` representation is defined to be the same as the | ||
follow types. These types don't actually exist, so the names are only here to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand what "follow types" means. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. following* |
||
help describe relationships. All of these type have the `C` representation. | ||
|
||
Likewise, combining the `C` representation with a primitive representation, the | ||
layout is unspecified. | ||
The enums with fields with the `C` representation, the represented enum, has | ||
the same representation of a a struct two fields, the tagged union. The first | ||
field of the tagged union is a field-less enum, the discriminant enum. The | ||
second field of the tagged union is a union, the fields union. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This paragraph is really hard to understand, even as someone who knows exactly what it's describing. I think we need to establish some more standard terms/short-hands for talking about type layouts. For example, "type with the It's also important here that it is not just the layout as the term is defined by this document, but also the ABI that matches. This is important for e.g. passing this type by-value in a C FFI function. I think we should either define layout to include ABI, or create a term for "layout and ABI". Here I'm going to use "representation" to refer to layout+ABI, and also use it to refer the "desugared" type to provide a potential rewrite (with an incredibly long pedantic aside):
|
||
|
||
### Primitive representations | ||
The discrimiant enum has one variant for each variant in the represented enum | ||
and are ordered in the same way as in the represented enum. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this could probably be omitted with my suggested rewrite) |
||
|
||
The fields union consists of fields corresponding to each variant in the | ||
represented enum. Each field contains the fields from the corresponding variant | ||
in the order defined in the variant. The valid field in the union is the one | ||
that corresponds to the same variant that the discriminant enum's value | ||
corresponds with. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could probably also be omitted with my rewrite? The idea of a "valid" field is also confusing. This seems like it's trying to use the C++ concept of "active members" of a union, but Rust doesn't have that notion. It's perfectly valid to read/write from any of the fields as long as the the actual written value is compatible with the read type (e.g. you don't read 3u8 as a bool). Minor variant punning is already being used in the wild: https://twitter.com/Gankro/status/964196064332079104 Note that if the enum needs_drop, a drop is a read. (possibly a thing for the nomicon) |
||
|
||
```rust | ||
// This Enum has the same layout as | ||
#[repr(C)] | ||
enum RepresentedEnum { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this example would benefit from some tweaked type names to better emphasize the connection
|
||
A(u32), | ||
B(f32, u64), | ||
C { x: u32, y: u8 }, | ||
D, | ||
} | ||
|
||
// this struct. | ||
#[repr(C)] | ||
struct TaggedUnion { | ||
tag: DiscriminantEnum, | ||
payload: FieldsUnion, | ||
} | ||
|
||
// This is the discriminant enum. | ||
#[repr(C)] | ||
enum DiscriminantEnum { A, B, C, D } | ||
|
||
// This is the variant union. | ||
#[repr(C)] | ||
union FieldsUnion { | ||
A: FieldsA, | ||
B: FieldsB, | ||
C: FieldsC, | ||
D: FieldsD, | ||
} | ||
|
||
#[repr(C)] | ||
struct FieldsA(u32); | ||
|
||
#[repr(C)] | ||
struct FieldsB(f32, u64); | ||
|
||
#[repr(C)] | ||
struct FieldsC { x: u32, y: u8 } | ||
|
||
#[repr(C)] | ||
struct FieldsD; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: you need to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also note: FieldsD could be omitted, and it must be in C(++) headers, so I am slightly inclined to simply omit it. Worth discussing this footgun? Or is that more of a nomicon thing. |
||
``` | ||
|
||
<span id="c-primitive-representation">Combining the `C` representation and a | ||
primitive representation is only defined for enums with fields. The primitive | ||
representation modifies the `C` representation by changing the representation of | ||
the discriminant enum to have the representation of the chosen primitive | ||
representation. So, if you chose the `u8` representation, then the discriminant | ||
enum would have a size and alignment of 1 byte.</span> | ||
|
||
> Note: This representation was designed for primarily interfacing with C code | ||
> that already exists matching a common way Rust's enums are implemented in | ||
> C. If you have control over both the Rust and C code, such as using C as FFI | ||
> glue between Rust and some third language, then you should use a | ||
> [primitive representation](#primitive-representation-of-enums-with-fields) | ||
> instead. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First sentence is a bit weird. Suggested rewrite:
(not sure if the reference uses my "C(++)" shorthand yet) |
||
|
||
### Primitive Representations | ||
|
||
The *primitive representations* are the representations with the same names as | ||
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `usize`, `i8`, | ||
`i16`, `i32`, `i64`, and `isize`. | ||
|
||
Primitive representations can only be applied to enumerations. | ||
Primitive representations can only be applied to enumerations, and have | ||
different behavior whether the enum has fields or no fields. It is an error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can/should this "whether" be an "if"? genuinely doesn't know There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From a logical standpoint, no. Could be "depending on". |
||
for [zero-variant enumerations] to have a primitive representation. | ||
|
||
Combining two primitive representations together is unspecified. | ||
|
||
Combining the `C` representation and a primitive representation is described | ||
[above](#c-primitive-representation). | ||
|
||
#### Primitive Representation of Field-less Enums | ||
|
||
For [field-less enums], they set the size and alignment to be the same as | ||
the primitive type of the same name. For example, a field-less enum with | ||
a `u8` representation can only have discriminants between 0 and 255 inclusive. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe it also gives the type the same ABI as a primitive int (e.g. it would be passed in a register instead of on the stack on some x86 ABIs) |
||
|
||
For [C-like enumerations], they set the size and alignment to be the same as the | ||
primitive type of the same name. For example, a C-like enumeration with a `u8` | ||
representation can only have discriminants between 0 and 255 inclusive. | ||
#### Primitive Representation of Enums With Fields | ||
|
||
It is an error for [zero-variant enumerations] to have a primitive | ||
representation. | ||
For enums with fields, the enum will have the same type layout a union with the | ||
`C` representation that's fields consist of structs with the `C` representation | ||
corresponding to each variant in the enum. The first field in each struct is | ||
the same field-less enum with the same primitive representation that is | ||
the enum with all fields in its variants removed and the rest of the fields | ||
consisting of the fields of the corresponding variant in the order defined in | ||
original enumeration. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is not as bad, but it might be possible to clean it up somewhat as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is definitely better than the
|
||
|
||
For all other enumerations, the layout is unspecified. | ||
Because unions with non-copy fields aren't allowed, this representation can only | ||
be used if every field is also [`Copy`]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "can only be used" -> "can only be expressed in Rust", maybe? (C(++) can use it fine, and you could do some really hacky crap to use it in Rust too) |
||
|
||
Likewise, combining two primitive representations together is unspecified. | ||
> Note: This is commonly different than what is done in C and C++. Projects in | ||
> those languages often use a tuple of `(enum, payload)`. For making your enum | ||
> represented like that, use the `C` representation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this is necessary. Or at least I would eliminate the reference to "what's commonly done in C(++)" which is like, never a true statement. |
||
|
||
```rust | ||
// This custom enum | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar notes on this example as the previous (although you used my preferred naming scheme on this one..?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The last commit was a WIP where I was transitioning from this style (that you like better) to the style I used in |
||
#[repr(u8)] | ||
enum MyEnum { | ||
A(u32), | ||
B(f32, u64), | ||
C { x: u32, y: u8 }, | ||
D, | ||
} | ||
|
||
// has the same type layout as this union | ||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd omit the |
||
union MyEnumRepr { | ||
A: MyEnumVariantA, | ||
B: MyEnumVariantB, | ||
C: MyEnumVariantC, | ||
D: MyEnumVariantD, | ||
} | ||
|
||
#[repr(u8)] | ||
#[derive(Clone, Copy)] | ||
enum MyEnumDiscriminant { A, B, C, D } | ||
|
||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
struct MyEnumVariantA(MyEnumDiscriminant, u32); | ||
|
||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
struct MyEnumVariantB(MyEnumDiscriminant, f32, u64); | ||
|
||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
struct MyEnumVariantC { tag: MyEnumDiscriminant, x: u32, y: u8 } | ||
|
||
#[repr(C)] | ||
#[derive(Clone, Copy)] | ||
struct MyEnumVariantD(MyEnumDiscriminant); | ||
``` | ||
|
||
### The `align` Representation | ||
|
||
|
@@ -288,7 +419,7 @@ padding bytes and forcing the alignment of the type to `1`. | |
The `align` and `packed` representations cannot be applied on the same type and | ||
a `packed` type cannot transitively contain another `align`ed type. | ||
|
||
> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and it is | ||
> Warning: Dereferencing an unaligned pointer is [undefined behavior] and it is | ||
> possible to [safely create unaligned pointers to `packed` fields][27060]. | ||
> Like all ways to create undefined behavior in safe Rust, this is a bug. | ||
|
||
|
@@ -298,7 +429,9 @@ a `packed` type cannot transitively contain another `align`ed type. | |
[`size_of`]: ../std/mem/fn.size_of.html | ||
[`Sized`]: ../std/marker/trait.Sized.html | ||
[dynamically sized types]: dynamically-sized-types.html | ||
[C-like enumerations]: items/enumerations.html#custom-discriminant-values-for-field-less-enumerations | ||
[field-less enums]: items/enumerations.html#custom-discriminant-values-for-field-less-enumerations | ||
[zero-variant enumerations]: items/enumerations.html#zero-variant-enums | ||
[undefined behavior]: behavior-considered-undefined.html | ||
[27060]: https://github.com/rust-lang/rust/issues/27060 | ||
[primitive representation]: #primitive-representations | ||
[`Copy`]: special-types-and-traits.html#copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to use "enums" in subsequent sections, but "enumerations" here? (I see this reference name is pre-existing but metadata shouldn't affect the actual text)