-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One bad task can halt all executor progress forever #4730
Comments
Yes, it's possible we should change the logic for when other sleeping threads are woken up. I worry whether it's possible to fix this without increasing the number of wakeups significantly. |
Alternatively, it might be nice to detect these cases and report them w/ logs / metrics. Even better would be some compiler lints. |
Here's my theory for the how the deadlock occurred in this example. The first thing to note is that even if multiple worker threads are idle, only of them is waiting for IO/timer events. The remaining threads are just sleeping using Now, given the above, here's what I think happened:
|
As far as I can tell, there is no bug in tokio. The example code could have happened before it just was far less likely due to erroneous thread wake ups that happened and were removed in 1.16. The issue here is fundamentally, when one accidentally blocks the runtime, it would be nice to know that something went wrong and gracefully tolerate it. Currently, Tokio does not offer a way to do this reliably, it just "happened" to do this in some cases earlier. The only reliable way to tolerate accidental blocking that I can think of is to have a dedicated thread to monitoring the state of each worker thread. If a thread gets stuck for too long, this monitor thread could warn & move all tasks currently on the stuck thread to a new thread. |
This almost feels like a limitation of the polling model. I don't see a good way to do this without some a future uring-based driver-per-worker runtime flavor. |
Well, one challenge with per-worker solutions is that if that ties each resource to a specific core, then that resource will not receive wakeups if that core is blocked. One thought: We don't necessarily need a separate thread for monitoring this. We could make sure that whenever at least one thread is idle, one of the idle threads without the IO driver has a monitor timeout on its Also, for anyone who wants a solution now, you can spawn your own monitor thread. See here for an example. |
Even if you don't block the runtime forever, wouldn't this possibly affect latency when it happens? Any task that becomes ready after the thread that previously held the IO driver starts running a task needs to wait until it finishes so that it can poll the IO driver again, even if you have available cores that could be handling that task. I think ideally it should always have something parked waiting on the IO driver if there's idle threads. |
Along the lines of @Darksonn's suggestion here...if a worker is currently holding the IO driver and it's transitioning to start polling its own tasks, shouldn't it try to wake a parked worker to steal the IO driver? It seems like, even if none of the tasks on the worker that's holding the IO driver will block, there's still potentially an entire scheduler tick of latency until IO is polled again, and we could avoid that by eagerly giving the IO driver to a parked thread... |
Hmm. Good point. |
Re-using an existing thread would be the strategy, however it still needs to be an option as waking up a thread on an interval just to do a check is fairly expensive for apps that are mostly idle. |
The underlying issue seems to be the case when the thread that runs the IO driver is polling its own tasks, while that happens the IO driver/poller won't run and thus work stealing won't happen, meaning that idle and parked threads will keep being parked even if there's pending work they could do. A promising solution for tokio is proposed in its issue tracker [0], but it wasn't yet implemented. So, as stop gap spawn a separate thread that periodically spawns a no-op ready future in the runtime which would unpark a worker in the aforementioned case and thus should break the bogus idleness. Choose a 3s period for that without any overly elaborate reasons, our main goal is to ensure we accept incoming connections and 3s is well below a HTTP timeout and leaves some room for high network latencies while not invoking to much additional wakeups for systems that are really idling. [0]: tokio-rs/tokio#4730 (comment) Link: tokio-rs/tokio#4730 Signed-off-by: Thomas Lamprecht <[email protected]>
Any updates on this? I might be running into this issue currently. |
There haven't been any updates. |
I think it would be a good idea to try either of the solutions from @hawkw and @Darksonn and hide them behind builder flags on |
I run into the same issue I think. I have a piece of code that does an async call (upload binary blob to S3). This code is tested and works correctly in a simple test-program. The explanation of @Darksonn on June 2th (no thread is watching the IO/timer events) could also explain the halted state of my program. I would welcome an option/feature to detect/analyse these kind of mysterious halting issues in the Tokio runtime (possibly as a feature that can be turned on/off as suggested by @Noah-Kennedy ). |
Generally, the cause of the halting is a task that blocks the thread. You should be able to detect those using tokio-console. Tasks that block the thread show up as "busy duration" increasing while the number of polls remains constant. |
I did all steps proposed by @Darksonn and analyzed using Tokio-console combined with gdb to take snapshots in different parts of (execution of) the program. This analysis shows: Observations via tokio-console
On the resources-overview I could not find an IO-driver resource and thus could not locate the owner of that resource. Only gdb reveal the http-timeout error when using breakpoints Same S3-upload code works well in test-program However, the test-program only has four tasks, while the program above has 12 tasks (due to Parquet-dependencies used to generate/compute the output-file). So I did not find a way to reproduce the issue in a small program.
I did add some changes to Stack and Heap by adding some variables, which adds some kind of memory-layout randomization, but this had no effect, which seems makes option 2 unlikely. So that leaves me with option 1 that I do not know how to analyze, or an option 3 that I still do not know. |
Further analysis revealed my issue is option 3. I am using a mix of Sync and Async code. The three patterns described in https://tokio.rs/tokio/topics/bridging where difficult in my context, so I tried to do it with block_on. However, tokio::runtime::Runtime::block_on is not allowed, so I assumes that futures::executors::block_on did call the installed async runtime (Tokio). However, it seems this calls its own executor and bypasses Tokio, which probably causes the issues. My alternative is using the tokio::runtime::Handle::spawn to spawn a task and use a oneshot-channel to wait for the result. However, I still have some work to do to please/fight the Rust borrow-checker on this solution. This feels like a complex solution to a simple problem, but I currently see no other options. |
My issues as mentioned in Januari/februari in this thread (related to mixing of Synchronous and Asynchronous code were resolved by https://github.com/cvkem/async_bridge (generic solution I developed to make mixing of code easier. |
It is possible for tokio to get into a state where no progress is made: tokio-rs/tokio#4730 . This PR adds monitoring that tries to detect such case. We don't have access to information which threads are waiting on IO driver so this code can produce false warnings.
I wonder why we don't create a dedicated thread to run the IO driver? |
Our current approach results in very little synchronization overhead and good tail latency under load. The background thread approach is much worse on both metrics, and is ultimately far slower and less effective for production workloads. |
Spent some time looking into this; sharing some findings/thoughts just in case. Benign case MRECreated an minimal reproducible example for the "benign case"; i.e. where we see an impact on latency without a deadlock.
Output:
Reproduced the issue on a arm server and the rust-playground. Possible solution 1Handing over the the driver before running the task seems work. Not particularly familiar with the codebase, but after reading the comments above a couple time and going through the code I noticed that i.e. made the change below to worker.rs:573.
Possible solution 2As mentioned in other comments, creating a background thread that periodically launches threads does mitigate the issue (a workaround). Random thoughts(not particularly well informed) From the comments above think everyone agrees that "this is not a good thing"; perhaps a sacrifice that is made to get some other (good) properties in return. Without knowing how much this affects real-world workloads it's hard to say how big of an issue this is. Overall think this behavior is undesirable and counter-intuitive for a work-stealing scheduler. Guess we can have a separate issue called "one benign task can halt all executor progress for a while". Wonder if the tracing feature or I tried the multi-threaded-alt scheduler and the issue is also present there; but wonder if the alt-scheduler would be the right place to ship a fix for this issue. The alt scheduler is only available on unstable; which means it's already behind an option. (just an idea; do not have much context here). |
I'll +1 this. This is a pretty ugly performance wart. Our team has independently hit this issue twice, where some singular task takes longer than the developer expects and this results in 100ms+ of latency for API requests that we otherwise expect to complete very quickly. The fact that one pathological task is able to harm a bunch of otherwise good API requests is alarming behavior. |
Test code pasted by @santossumii using
output
The heavy compute task won't halt other tasks. I don't know the implementation of tokio, but seems this is a big problem as users are very likely to hit this issue. |
Except for using use std::time::Duration;
use std::time::Instant;
async fn handle_request(start_time: Instant) {
tokio::time::sleep(Duration::from_millis(500)).await;
println!("request took {}ms", start_time.elapsed().as_millis());
}
async fn background_job() {
loop {
tokio::time::sleep(Duration::from_secs(3)).await;
println!("background job start");
// Adjust as needed
tokio::task::block_in_place(|| {
for _ in 0..1_000 {
vec![1; 1_000_000].sort()
}
});
println!("background job finished");
}
}
#[tokio::main]
async fn main() {
tokio::spawn(async { background_job().await });
loop {
let start = Instant::now();
tokio::spawn(async move { handle_request(start).await })
.await
.unwrap();
}
} Output:
The documentation of /// Calling this function informs the
/// executor that the currently executing task is about to block the thread,
/// so the executor is able to hand off any other tasks it has to a new
/// worker thread before that happens.
If there are more running Don't know how async-std is dealing with blocking tasks. For Tokio, the responsibility of dealing with blocking tasks is on the programmer. |
Good workaround. But we can not control our third-party libs if they have bad tasks. |
is |
Look, it isn't so simple. All potential solutions have disadvantages to them. They hurt the performance of well-behaved programs, and also raise questions about forwards-compatibility. If you want to help, you could investigate how other runtimes handle this problem. For example, what does (did?) Go do if the thread with the epoll instance gets blocked by a tight loop without yield points, and all other threads are asleep? (Runtimes that poll IO from a dedicated thread are not relevant. We are not going to do that.) |
This feels somewhat similar to Spectre and Meltdown where Intel was "caught speeding" -- their optimizations were too aggressive for the happy-path (i.e. CPU only loaded with trusted, well-behaved programs) and that opened the door to other issues. Similarly, everything is great with tokio, until it's not. Common in software engineering, it's not so much the performance or operation of a well-behaved program that matters, but dealing with the exceptional-cases or nefarious actors that requires so much thought. This might be where we're at here. It would be interesting to know how much of a performance hit the potential solutions have and if it would be possible to feature-flag it. We could benchmark our own app, though it's probably not representative of other workloads, and we wouldn't be able to take this work item until the new year. |
This problem is something you get "by default" if you do it the obvious way. It isn't because we have an extra optimization that is invalid.
You should not expect a fix for this over the holidays anyway. It's nowhere near that urgent. And there is even the workaround of spawning your own monitoring thread, as I mentioned in a previous comment. |
As @darkson mentioned, there is no obvious solution to this that comes without additional overhead. Fundamentally, Tokio is a cooperative multi-tasking system, which puts some responsibility on the user. As such, I will close this issue as there is no "bug" in Tokio. That said, there would be value in detecting and possibly mitigating poorly behaved tasks on an opt-in basis. I opened #6315 as a tracking issue and referenced this issue. This will help reframe the conversation. |
Version
Tested 1.16, 1.17, 1.18.2
Platform
Confirmed on Linux amd64 and M1 Mac
Description
If one task gets stuck in a busy loop, or doing blocking IO, it can prevent all other tasks from being polled forever, even with the
multi_thread
executor. All that seems to be required is for the bad task to jump threads once, which seems to happen fairly randomly.This is a serious issue because:
I tried this code:
I expected to see this happen:
With 32 threads available, you'd expect one to be blocked, but both tasks should print messages and proceed.
Instead, this happened:
good still alive
will never print againThe text was updated successfully, but these errors were encountered: