-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustup is still slow under WSL #1866
Comments
So one option is we could include the Alternatively we just enable threading for closes always -- I would like us to have a more robust limit on open handles at that point, (perhaps limited queue length would be sufficient?) |
We can do some experiments... windows doesn't have anywhere near the same
file descriptor limits Unix does, so sure, but let's make throttling limits
platform specific or actual OS resource getrlimits based or some such.
…On Fri, 24 May 2019, 03:18 Daniel Silverstone, ***@***.***> wrote:
So one option is we could include the uname crate if we don't already,
and then use uname::uname()?.release.find("Microsoft") to detect WSL.
Alternatively we just enable threading for closes always -- I would like
us to have a more robust limit on open handles at that point, (perhaps
limited queue length would be sufficient?)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1866>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADZ7XW6R6IPTHYFXGDT7FLPW2YSVANCNFSM4HOZTIHA>
.
|
Sounds good. I don't have a WSL setup (yet) but I can try and get one on my gaming platform (rapidly I'm going to have to think of |
First experiment, enabling threads under WSL was entertaining. Made no impact. Something in LXCORE.SYS appears to be serialising all IO. Hah. Hah. Hah. Probably this is one of the drivers for WSL2... |
First experiment, enabling threads under WSL was entertaining. Made no impact. Something in LXCORE.SYS appears to be serialising all IO. Hah. Hah. Hah. Probably this is one of the drivers for WSL2... Running https://openbenchmarking.org/innhold/802aab108da0cc656d7560a98447d3556af818da under WSL 1 shows all 64 cores used, so its not some sort of green threading madness. microsoft/WSL#873 is likely the upstream bug report to track down. I'll do a little more measurement because puzzles and #1867 will hopefully benefit too, though I haven't read the new strace yet. |
I've modified that to print out '{} is prime', number) for numbers within the last 1000 and run it with large amounts under strace to make sure that strace doesn't impose single thread execution or anything, it doesn't... moving back onto rustup. |
And here is a strace. tl;dr: openat is extraordinarily slow in WSL1, with typical times in the 17ms during docs unpack. Similar to close. I can't say if we're facing actual serialisation in the linux emulator or not at this stage.
strace -f -T rustup under WSL with threads
If we pull out a larger region and strip the futexes entirely we get: trimmed traceBlank lines between different thread activity.
which comes after writing
Though those times are already substantially slower than the first creation openat:
With openat being slow, we can't avoid that latency; so unless we can find a way to convince WSL openat to be fast, or do the work to let us do all the IO in threads in tar-rs, I think we can' t do much more latency hiding in WSLv1. |
I ran a procmon log at the same time as strace. so this 10ms open:
can be tracked to this region procmon trace
The key bits we need are this:
The last windows syscall is started at 8:11:46.5024861 or 0.0005421s after the first one starts, which implies some but not a lot of inter-call processing. So we have a 0.0001770 gap, then some syscalls for 0.0005421+0.0000370 - we can explain an openat call that takes 0.0007561s - but Linux saw 0.0107820, a difference of 0.0100259, or 10ms. The only explanation that makes sense of this for me is a coarse lock around the fd table, and close() holding that lock for the duration of CloseHandle, rather than invalidating the fd (but not releasing it); closing it, then releasing it. Or some similar thing. Does this stack up? Lets have a look at the surrounding strace:
Here three threads are involved: Sure looks like a coarse lock on the fd<->Handle table to me, but of course, I don't have the WSL source code. Lets check what that close(8) was for and look at its procmon close data.
and in procmon
At .4994827 close is started and takes 0122879 finishing at .5117706. At 5005271 Defender starts operating on the file. So this lines up very well: defender mmaps the file when close is called, blocks the close, and WSL has a coarse lock on the FD table. |
So the model I now have is:
Things we haven't modelled:
Possible workarounds to try:
|
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
Set RUSTUP_CLOSE_THREADS=disabled to force single threaded IO, or to a specific number if desired for testing/tuning. This may improve rust-lang#1867, but has no impact on rust-lang#1866 due to the coarse lock around the fd-handle table inside WSL.
@rustbot label: +O-windows +O-linux |
Does this remain an issue with wsl2 ? If not, I suggest we close this. |
I haven't tested yet; WSL1 is still the only WSL some users can use due to enterprise rollout channels. If WSL2 is slow it will be for different reasons. |
just for kicks I ran under WSL2 with .rustup symlinked onto
Quite slow. |
rustc 1.60.0 (7737e0b5c 2022-04-04) very very very slow building under WSL2 on win11 |
Rustc is not rustup, please file your ticket on the rust repository. |
Problem
We forgot about WSL when making the threading code OS specific.
Steps
Possible Solution(s)
Perhaps thread always? Or perhaps try to detect WSL?
Notes
Output of
rustup --version
:Output of
rustup show
:The text was updated successfully, but these errors were encountered: