diff --git a/guide/src/SUMMARY.md b/guide/src/SUMMARY.md index 383f3310b95..995a168067c 100644 --- a/guide/src/SUMMARY.md +++ b/guide/src/SUMMARY.md @@ -17,6 +17,7 @@ - [Parallelism](parallelism.md) - [Debugging](debugging.md) - [Features Reference](features.md) +- [Memory Management](memory.md) - [Advanced Topics](advanced.md) - [Building and Distribution](building_and_distribution.md) - [Supporting multiple Python versions](building_and_distribution/multiple_python_versions.md) diff --git a/guide/src/advanced.md b/guide/src/advanced.md index d680477627c..d98ab62f41a 100644 --- a/guide/src/advanced.md +++ b/guide/src/advanced.md @@ -8,10 +8,7 @@ The C API is naturally unsafe and requires you to manage reference counts, error ## Memory Management -PyO3's "owned references" (`&PyAny` etc.) make PyO3 more ergonomic to use by ensuring that their lifetime can never be longer than the duration the Python GIL is held. This means that most of PyO3's API can assume the GIL is held. (If PyO3 could not assume this, every PyO3 API would need to take a `Python` GIL token to prove that the GIL is held.) - -The caveat to these "owned references" is that Rust references do not normally convey ownership (they are always `Copy`, and cannot implement `Drop`). Whenever a PyO3 API returns an owned reference, PyO3 stores it internally, so that PyO3 can decrease the reference count just before PyO3 releases the GIL. - -For most use cases this behaviour is invisible. Occasionally, however, users may need to clear memory usage sooner than PyO3 usually does. PyO3 exposes this functionality with the the `GILPool` struct. When a `GILPool` is dropped, ***all*** owned references created after the `GILPool` was created will be cleared. - -The unsafe function `Python::new_pool` allows you to create a new `GILPool`. When doing this, you must be very careful to ensure that once the `GILPool` is dropped you do not retain access any owned references created after the `GILPool` was created. +PyO3's `&PyAny` "owned references" and `Py` smart pointers are used to +access memory stored in Python's heap. This memory sometimes lives for longer +than expected because of differences in Rust and Python's memory models. See +the chapter on [memory management](./memory,md) for more information. diff --git a/guide/src/memory.md b/guide/src/memory.md new file mode 100644 index 00000000000..fa05c5d7c47 --- /dev/null +++ b/guide/src/memory.md @@ -0,0 +1,185 @@ +# Memory Management + +Rust and Python have very different notions of memory management. Rust has +a strict memory model with concepts of ownership, borrowing, and lifetimes, +where memory is freed at predictable points in program execution. Python has +a looser memory model in which variables are reference-counted with shared, +mutable state by default, a global interpreter lock (GIL) is needed to prevent +race conditions, and a garbage collector is needed to break reference cycles. +Memory in Python is freed eventually by the garbage collector, but not usually +in a predictable way. + +PyO3 bridges the Rust and Python memory models with two different strategies for +accessing memory allocated on Python's heap from inside Rust. These are +GIL-bound, or "owned" references, and GIL-independent `Py` smart pointers. + +## GIL-bound Memory + +PyO3's GIL-bound, "owned references" (`&PyAny` etc.) make PyO3 more ergonomic to +use by ensuring that their lifetime can never be longer than the duration the +Python GIL is held. This means that most of PyO3's API can assume the GIL is +held. (If PyO3 could not assume this, every PyO3 API would need to take a +`Python` GIL token to prove that the GIL is held.) This allows us to write +very simple and easy-to-understand programs like this: + +```rust +Python::with_gil(|py| -> PyResult<()> { + let hello: &PyString = py.eval("\"Hello World!\"", None, None)?.extract()?; + println!("Python says: {}", hello); + Ok(()) +})?; +``` + +Internally, calling `Python::with_gil()` or `Python::acquire_gil()` creates a +`GILPool` which owns the memory pointed to by the reference. In the example +above, the lifetime of the reference `hello` is bound to the `GILPool`. When +the `with_gil()` closure ends or the `GILGuard` from `acquire_gil()` is dropped, +the `GILPool` is also dropped and the Python reference counts of the variables +it owns are decreased, releasing them to the Python garbage collector. Most +of the time we don't have to think about this, but consider the following: + +```rust +Python::with_gil(|py| -> PyResult<()> { + for _ in 0..10 { + let hello: &PyString = py.eval("\"Hello World!\"", None, None)?.extract()?; + println!("Python says: {}", hello); + } + // There are 10 copies of `hello` on Python's heap here. + Ok(()) +})?; +``` + +We might assume that the `hello` variable's memory is freed at the end of each +loop iteration, but in fact we create 10 copies of `hello` on Python's heap. +This may seem surprising at first, but it is completely consistent with Rust's +memory model. The `hello` variable is dropped at the end of each loop, but it +is only a reference to the memory owned by the `GILPool`, and its lifetime is +bound to the `GILPool`, not the for loop. The `GILPool` isn't dropped until +the end of the `with_gil()` closure, at which point the 10 copies of `hello` +are finally released to the Python garbage collector. + +In general we don't want unbounded memory growth during loops! One workaround +is to acquire and release the GIL with each iteration of the loop. + +```rust +for _ in 0..10 { + Python::with_gil(|py| -> PyResult<()> { + let hello: &PyString = py.eval("\"Hello World!\"", None, None)?.extract()?; + println!("Python says: {}", hello); + Ok(()) + })?; // only one copy of `hello` at a time +} +``` + +It might not be practical or performant to acquire and release the GIL so many +times. Another workaround is to work with the `GILPool` object directly, but +this is unsafe. + +```rust +Python::with_gil(|py| -> PyResult<()> { + for _ in 0..10 { + let pool = unsafe { py.new_pool() }; + let py = pool.python(); + let hello: &PyString = py.eval("\"Hello World!\"", None, None)?.extract()?; + println!("Python says: {}", hello); + } + Ok(()) +})?; +``` + +The unsafe method `Python::new_pool` allows you to create a nested `GILPool` +from which you can retrieve a new `py: Python` GIL token. Variables created +with this new GIL token are bound to the nested `GILPool` and will be released +when the nested `GILPool` is dropped. Here, the nested `GILPool` is dropped +at the end of each loop iteration, before the `with_gil()` closure ends. + +When doing this, you must be very careful to ensure that once the `GILPool` is +dropped you do not retain access any owned references created after the +`GILPool` was created. Read the documentation for `Python::new_pool()` for more +information on safety. + +## GIL-independent Memory + +Sometimes we need a reference to memory on Python's heap that can outlive the +GIL. Python's `Py` is analogous to `Rc`, but for variables whose +memory is allocated on Python's heap. Cloning a `Py` increases its +internal reference count just like cloning `Rc`. The smart pointer can +outlive the GIL from which it was created. It isn't magic, though. We need to +reacquire the GIL to access the memory pointed to by the `Py`. + +What happens to the memory when the last `Py` is dropped and its +reference count reaches zero? It depends whether or not we are holding the GIL. + +```rust +Python::with_gil(|py| -> PyResult<()> { + let hello: Py = py.eval("\"Hello World!\"", None, None)?.extract())?; + println!("Python says: {}", hello.as_ref(py)); + Ok(()) +}); +``` + +At the end of the `Python::with_gil()` closure `hello` is dropped, and then the +GIL is dropped. Since `hello` is dropped while the GIL is still held by the +current thread, its memory is released to the Python garbage collector +immediately. + +This example wasn't very interesting. We could have just used a GIL-bound +`&PyString` reference. What happens when the last `Py` is dropped while +we are *not* holding the GIL? + +```rust +let hello: Py = Python::with_gil(|py| { + Py = py.eval("\"Hello World!\"", None, None)?.extract()) +})?; +// Do some stuff... +// Now sometime later in the program we want to access `hello`. +Python::with_gil(|py| { + println!("Python says: {}", hello.as_ref(py)); +}); +// Now we're done with `hello`. +drop(hello); // Memory *not* released here. +// Sometime later we need the GIL again for something... +Python::with_gil(|py| + // Memory for `hello` is released here. +); +``` + +When `hello` is dropped *nothing* happens to the pointed-to memory on Python's +heap because nothing _can_ happen if we're not holding the GIL. Fortunately, +the memory isn't leaked. PyO3 keeps track of the memory internally and will +release it the next time we acquire the GIL. + +We can avoid the delay in releasing memory if we are careful to drop the +`Py` while the GIL is held. + +```rust +let hello: Py = Python::with_gil(|py| { + Py = py.eval("\"Hello World!\"", None, None)?.extract()) +})?; +// Do some stuff... +// Now sometime later in the program: +Python::with_gil(|py| { + println!("Python says: {}", hello.as_ref(py)); + drop(hello); // Memory released here. +}); +``` + +We could also have used `Py::into_ref()`, which consumes `self`, instead of +`Py::as_ref()`. But note that in addition to being slower than `as_ref()`, +`into_ref()` binds the memory to the lifetime of the `GILPool`, which means +that rather than being released immediately, the memory will not be released +until the GIL is dropped. + +```rust +let hello: Py = Python::with_gil(|py| { + Py = py.eval("\"Hello World!\"", None, None)?.extract()) +})?; +// Do some stuff... +// Now sometime later in the program: +Python::with_gil(|py| { + println!("Python says: {}", hello.into_ref(py)); + // Memory not released yet. + // Do more stuff... + // Memory released here at end of `with_gil()` closure. +}); +```