Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Use a const initializer for GIL_COUNT if possible #3177

Merged
merged 2 commits into from
May 24, 2023

Conversation

lifthrasiir
Copy link
Contributor

Normally LocalKey::try_with needs to check for the initialization flag to lazily initialize the TLS. This behaves badly with a compiler optimization and it seems that multiple calls to gil_is_acquired() cannot be correctly eliminated. Rust 1.59 added a const { ... } initializer (a special form only allowed here for now) in thread_local! which removes the initialization flag, allowing those kind of optimizations.

I should note that the performance impact is probably minimal, because the check branch is extremely well predicted and the optimization is only possible when every PyO3 code that leads to gil_is_acquired() is inlined, a pretty uncommon situation. I couldn't demonstrate any consistent improvement or regression from my machine, which performance seems to be flaky as well. But at least we would have one less branch there. I'll leave this as an RFC so that someone else can prove or disprove that this is indeed beneficial.

@adamreichold adamreichold added the CI-skip-changelog Skip checking changelog entry label May 23, 2023
@adamreichold
Copy link
Member

I should note that the performance impact is probably minimal, because the check branch is extremely well predicted and the optimization is only possible when every PyO3 code that leads to gil_is_acquired() is inlined, a pretty uncommon situation. I couldn't demonstrate any consistent improvement or regression from my machine, which performance seems to be flaky as well. But at least we would have one less branch there. I'll leave this as an RFC so that someone else can prove or disprove that this is indeed beneficial.

I think we should apply this change even if we cannot measure an improvement, as long as we do not measure a regression. The reason is that the effects will most likely be hard to measure as you point, mostly due to branch prediction hiding the costs of the branch. But it still one branch less and hence less pressure on the global resources backing branch prediction in the first place. Meaning that effects would most likely only be measurable when these global resources are oversubscribed.

I also think the change is nicely localized and carries a reasonable maintenance cost which we will loose eventually as our MSRV passes 1.59 (which are targetting 1.56 for PyO3 0.20 at the moment, but nothing is set in stone yet).

@davidhewitt
Copy link
Member

Agreed, I think this is a worthwhile improvement - even if immeasurable now and requires a conditional flag, with a future Rust version this will have no maintainance cost and be strictly better.

Since `Vec::with_capacity` is not a const function, it now does not
allocate any initial memory.
@adamreichold
Copy link
Member

LGTM

bors r+

@bors
Copy link
Contributor

bors bot commented May 24, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit dfc667f into PyO3:main May 24, 2023
@lifthrasiir lifthrasiir deleted the gil-count-const-init branch May 24, 2023 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-skip-changelog Skip checking changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants