diff --git a/text/1892-uninitialized-uninhabited.md b/text/1892-uninitialized-uninhabited.md new file mode 100644 index 00000000000..134108191f4 --- /dev/null +++ b/text/1892-uninitialized-uninhabited.md @@ -0,0 +1,294 @@ +- Feature Name: `uninitialized_uninhabited` +- Start Date: 2017-02-09 +- RFC PR: [rust-lang/rfcs#1892](https://github.com/rust-lang/rfcs/pull/1892) +- Rust Issue: [rust-lang/rust#53491](https://github.com/rust-lang/rust/issues/53491) + +# Summary +[summary]: #summary + +Deprecate `mem::uninitialized::` and `mem::zeroed::` and replace them with +a `MaybeUninit` type for safer and more principled handling of uninitialized +data. + +# Motivation +[motivation]: #motivation + +The problems with `uninitialized` centre around its usage with uninhabited +types, and its interaction with Rust's type layout invariants. The concept of +"uninitialized data" is extremely problematic when it comes into contact with +types like `!` or `Void`. + +For any given type, there may be valid and invalid bit-representations. For +example, the type `u8` consists of a single byte and all possible bytes can be +sensibly interpreted as a value of type `u8`. By contrast, a `bool` also +consists of a single byte but not all bytes represent a `bool`: the +bit vectors `[00000000]` (`false`) and `[00000001]` (`true`) are valid `bool`s +whereas `[00101010]` is not. By further contrast, the type `!` has no valid +bit-representations at all. Even though it's treated as a zero-sized type, the +empty bit vector `[]` is not a valid representation and has no interpretation +as a `!`. + +As `bool` has both valid and invalid bit-representations, an uninitialized +`bool` cannot be known to be invalid until it is inspected. At this point, if +it is invalid, the compiler is free to invoke undefined behaviour. By contrast, +an uninitialized `!` can only possibly be invalid. Without even inspecting such +a value the compiler can assume that it's working in an impossible +state-of-affairs whenever such a value is in scope. This is the logical basis +for using a return type of `!` to represent diverging functions. If we call a +function which returns `bool`, we can't assume that the returned value is +invalid and we have to handle the possibility that the function returns. +However if a function call returns `!`, we know that the function cannot +sensibly return. Therefore we can treat everything after the call as dead code +and we can write-off the scenario where the function *does* return as being +undefined behaviour. + +The issue then is what to do about `uninitialized::()` where `T = !`? +`uninitialized::` is meaningless for uninhabited `T` and is currently +instant undefined behaviour when `T = !` - even if the "value of type `!`" is +never read. The type signature of `uninitialized::` is, after all, that of a +diverging function: + +```rust +fn mem::uninitialized::() -> ! +``` + +Yet calling this function does not diverge! It just breaks everything then eats +your laundry instead. + +This problem is most prominent with `!` but also applies to other types that +have restrictions on the values they can carry. For example, +`Some(mem::uninitialized::()).is_none()` could actually return `true` +because uninitialized memory could violate the invariant that a `bool` is always +`[00000000]` or `[00000001]` -- and Rust relies on this invariant when doing +enum layout. So, `mem::uninitialized::()` is instantaneous undefined +behavior just like `mem::uninitialized::()`. This also affects `mem::zeroed` +when considering types where the all-`0` bit pattern is not valid, like +references: `mem::zeroed::<&'static i32>()` is instantaneous undefined behavior. + +## Tracking uninitializedness in the type + +An alternative way of representing uninitialized data is through a union type: + +```rust +union MaybeUninit { + uninit: (), + value: T, +} +``` + +Instead of creating an "uninitialized value", we can create a `MaybeUninit` +initialized with `uninit: ()`. Then, once we know that the value in the union +is valid, we can extract it with `my_uninit.value`. This is a better way of +handling uninitialized data because it doesn't involve lying to the type system +and pretending that we have a value when we don't. It also better represents +what's actually going on: we never *really* have a value of type `T` when we're +using `uninitialized::`, what we have is some memory that contains either a +value (`value: T`) or nothing (`uninit: ()`), with it being the programmer's +responsibility to keep track of which state we're in. Notice that creating a +`MaybeUninit` is safe for any `T`! Only when accessing `my_uninit.value`, +we have to be careful to ensure this has been properly initialized. + +To see how this can replace `uninitialized` and fix bugs in the process, +consider the following code: + +```rust +fn catch_an_unwind T>(f: F) -> Option { + let mut foo = unsafe { + mem::uninitialized::() + }; + let mut foo_ref = &mut foo as *mut T; + + match std::panic::catch_unwind(|| { + let val = f(); + unsafe { + ptr::write(foo_ref, val); + } + }) { + Ok(()) => Some(foo); + Err(_) => None + } +} +``` + +Naively, this code might look safe. The problem though is that by the time we +get to `let mut foo_ref` we're already saying we have a value of type `T`. But +we don't, and for `T = !` this is impossible. And so if this function is called +with a diverging callback it will invoke undefined behaviour before it even +gets to `catch_unwind`. + +We can fix this by using `MaybeUninit` instead: + +```rust +fn catch_an_unwind T>(f: F) -> Option { + let mut foo: MaybeUninit = MaybeUninit { + uninit: (), + }; + let mut foo_ref = &mut foo as *mut MaybeUninit; + + match std::panic::catch_unwind(|| { + let val = f(); + unsafe { + ptr::write(&mut (*foo_ref).value, val); + } + }) { + Ok(()) => { + unsafe { + Some(foo.value) + } + }, + Err(_) => None + } +} +``` + +Note the difference: we've moved the unsafe block to the part of the code which is +actually unsafe - where we have to assert to the compiler that we have a valid +value. And we only ever tell the compiler we have a value of type `T` where we +know we actually do have a value of type `T`. As such, this is fine to use with +any `T`, including `!`. If the callback diverges then it's not possible to get +to the `unsafe` block and try to read the non-existant value. + +Given that it's so easy for code using `uninitialzed` to hide bugs like this, +and given that there's a better alternative, this RFC proposes deprecating +`uninitialized` and introducing the `MaybeUninit` type into the standard +library as a replacement. + +# Detailed design +[design]: #detailed-design + +Add the aforementioned `MaybeUninit` type to the standard library: + +```rust +pub union MaybeUninit { + uninit: (), + value: ManuallyDrop, +} +``` + +The type should have at least the following interface +([Playground link](https://play.rust-lang.org/?gist=81f5ab9a7e7107c9583de21382ef4333&version=nightly&mode=debug&edition=2015)): + +```rust +impl MaybeUninit { + /// Create a new `MaybeUninit` in an uninitialized state. + /// + /// Note that dropping a `MaybeUninit` will never call `T`'s drop code. + /// It is your responsibility to make sure `T` gets dropped if it got initialized. + pub fn uninitialized() -> MaybeUninit { + MaybeUninit { + uninit: (), + } + } + + /// Create a new `MaybeUninit` in an uninitialized state, with the memory being + /// filled with `0` bytes. It depends on `T` whether that already makes for + /// proper initialization. For example, `MaybeUninit::zeroed()` is initialized, + /// but `MaybeUninit<&'static i32>::zeroed()` is not because references must not + /// be null. + /// + /// Note that dropping a `MaybeUninit` will never call `T`'s drop code. + /// It is your responsibility to make sure `T` gets dropped if it got initialized. + pub fn zeroed() -> MaybeUninit { + let mut u = MaybeUninit::::uninitialized(); + unsafe { u.as_mut_ptr().write_bytes(0u8, 1); } + u + } + + /// Set the value of the `MaybeUninit`. The overwrites any previous value without dropping it. + pub fn set(&mut self, val: T) { + unsafe { + self.value = ManuallyDrop::new(val); + } + } + + /// Extract the value from the `MaybeUninit` container. This is a great way + /// to ensure that the data will get dropped, because the resulting `T` is + /// subject to the usual drop handling. + /// + /// # Unsafety + /// + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized + /// state, otherwise this will immediately cause undefined behavior. + pub unsafe fn into_inner(self) -> T { + std::ptr::read(&*self.value) + } + + /// Get a reference to the contained value. + /// + /// # Unsafety + /// + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized + /// state, otherwise this will immediately cause undefined behavior. + pub unsafe fn get_ref(&self) -> &T { + &*self.value + } + + /// Get a mutable reference to the contained value. + /// + /// # Unsafety + /// + /// It is up to the caller to guarantee that the the `MaybeUninit` really is in an initialized + /// state, otherwise this will immediately cause undefined behavior. + pub unsafe fn get_mut(&mut self) -> &mut T { + &mut *self.value + } + + /// Get a pointer to the contained value. Reading from this pointer will be undefined + /// behavior unless the `MaybeUninit` is initialized. + pub fn as_ptr(&self) -> *const T { + unsafe { &*self.value as *const T } + } + + /// Get a mutable pointer to the contained value. Reading from this pointer will be undefined + /// behavior unless the `MaybeUninit` is initialized. + pub fn as_mut_ptr(&mut self) -> *mut T { + unsafe { &mut *self.value as *mut T } + } +} +``` + +Deprecate `uninitialized` with a deprecation messages that points people to the +`MaybeUninit` type. Make calling `uninitialized` on an empty type trigger a +runtime panic which also prints the deprecation message. + +# How We Teach This +[how-we-teach-this]: #how-we-teach-this + +Correct handling of uninitialized data is an advanced topic and should probably +be left to The Rustonomicon. There should be a paragraph somewhere therein +introducing the `MaybeUninit` type. + +The documentation for `uninitialized` should explain the motivation for these +changes and direct people to the `MaybeUninit` type. + +# Drawbacks +[drawbacks]: #drawbacks + +This will be a rather large breaking change as a lot of people are using +`uninitialized`. However, much of this code already likely contains subtle +bugs. + +# Alternatives +[alternatives]: #alternatives + +* Not do this. +* Just make `uninitialized::` panic instead (making `!`'s behaviour + surprisingly inconsistent with all the other types). +* Introduce an `Inhabited` auto-trait for inhabited types and add it as a bound + to the type argument of `uninitialized`. +* Disallow using uninhabited types with `uninitialized` by making it behave + like `transmute` does today - by having restrictions on its type arguments + which are enforced outside the trait system. + +# Unresolved questions +[unresolved]: #unresolved-questions + +None known. + +# Future directions + +Ideally, Rust's type system should have a way of talking about initializedness +statically. In the past there have been proposals for new pointer types which +could safely handle uninitialized data. We should seriously consider pursuing +one of these proposals. +