-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41862: [C++][S3] Fix potential deadlock when closing output stream #41876
Conversation
@github-actions crossbow submit -g cpp |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit -g cpp |
Revision: 51d0738 Submitted crossbow builds: ursacomputing/crossbow @ actions-b40dadb2f5 |
cpp/src/arrow/filesystem/s3fs.cc
Outdated
lock.unlock(); | ||
state->pending_parts_completed.MarkFinished(state->status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's possible for this object to go from "Finished" to "not-finished". There might be logic relying on the state machine converging to the finished state and staying there.
I might do a full review of this class at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's possible for this object to go from "Finished" to "not-finished".
Where is it possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto fut = state->pending_parts_completed;
lock.unlock();
fut.MarkFinished(state->status);
Maybe something can be done if we afraid pending_parts_completed
changed, but I've checked that it wouldn't happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state->parts_in_progress
reached 0
, you unlock, another thread could increment parts_in_progress
and now you call MarkFinished
. Is parts_in_progress > 0
considered "finished"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, yes, it can. This is possible because the Write
and HandleUploadOutcome
can run concurrently, we'd better do like https://github.com/apache/arrow/pull/41876/files#r1631434845 to avoid mark finish before a task is really finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, you're right. @mapleFU's suggestion looks ok to me.
Note that the pending_parts_completed
future can only be waited on in two situations:
- the user called blocking
Close
orFlush
and the future is waited upon before returning from the API call; - the user called non-blocking
CloseAsync
, which returns a cascaded future obtained by chainingpending_parts_completed.Then
with a continuation.
For Close
and CloseAsync
, it's certainly not ok to call Write
from another thread concurrently.
For Flush
, it should be ok to call Write
concurrently, but the Flush
does not have to wait for the completion of the concurrent Write
call.
Moreover, more generally, it doesn't seem sound to write to an output stream (rather than random-access file) from several thread concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the user called blocking Close or Flush and the future is waited upon before returning from the API call;
Maybe there could be a sequence:
- A last request finished, acquire lock, dec count, and set it to 0
- New request sent,
pending_parts_completed
set to a new one - (1) call
pending_parts_completed.MarkFinished
, which may call on the new one
So the further blocking would wrong?
@github-actions crossbow submit -g cpp |
Revision: c48d59a Submitted crossbow builds: ursacomputing/crossbow @ actions-4b9dce20f9 |
Ok, since there are two +1s, I will merge if CI is green (or the failures are unrelated). |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 036fca0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 20 possible false positives for unstable benchmarks that are known to sometimes produce them. |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 036fca0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 20 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
When the Future returned by
OutputStream::CloseAsync
finishes, it can invoke a user-supplied callback. That callback may well destroy the stream as a side effect. If the stream is a S3 output stream, this might lead to a deadlock involving the mutex in the output stream'sUploadState
structure, since the callback is called with that mutex locked.What changes are included in this PR?
Unlock the
UploadState
mutex before marking the Future finished, to avoid deadlocking.Are these changes tested?
No. Unfortunately, I wasn't able to write a test that would trigger the original condition. Additional preconditions seem to be required to reproduce the deadlock. For example, it might require a mutex implementation that hangs if destroyed while locked.
Are there any user-facing changes?
No.