Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update copy.DirCopy to leave sockets in the file system #2117

Merged

Conversation

anderbubble
Copy link
Contributor

@anderbubble anderbubble commented Sep 27, 2024

copy.DirCopy contains code to create named sockets matching those on the source; however, this operation closes the socket, removing it implicitly from the destination and causing later chown and chmod operations to fail.

This commit creates matching file system nodes directly without opening a socket or invoking the auto-removal semantics of a listener.

@anderbubble anderbubble force-pushed the dircopy-leave-socket branch 2 times, most recently from 0a7988c to 4c029de Compare September 27, 2024 20:15
@anderbubble
Copy link
Contributor Author

If possible, I'd like to see this backported to 1.55 (before the shift to go 1.22) so that we could use it in Warewulf on EL 8 and 9.

@rhatdan
Copy link
Member

rhatdan commented Sep 27, 2024

@mtrmac @giuseppe @nalind @Luap99 PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance this makes no sense to me as proposed: if we don’t close that socket, what is going to close it?

This code needs to work in long-running processes like CRI-O, so leaking file descriptors is not acceptable.

(And, au contraire, if Go’s garbage collector somehow caused the socket to be closed eventually, then that means that the problematic chown/chmod operations could also fail, depending on GC timing.)


It might well be the case that we need to keep the socket open longer, and close it elsewhere; I’m not saying that the current code is correct, I didn’t look into that at the detail, at this point I just took the 1 minute to notice that the PR as currently proposed looks incorrect.

@anderbubble
Copy link
Contributor Author

anderbubble commented Sep 28, 2024

I thought that this was just a difference in semantics between a TCP socket and a Unix domain socket; but turns out Go just mixes close with unlink. What I really want is SetUnlinkOnClose(false).

I'll fix it. Thanks for the review.

@anderbubble anderbubble changed the title Update copy.DirCopy to leave sockets open Update copy.DirCopy to leave sockets in the file system Sep 28, 2024
@anderbubble anderbubble force-pushed the dircopy-leave-socket branch 6 times, most recently from 370a3f9 to 78c2a81 Compare September 28, 2024 03:49
@anderbubble anderbubble marked this pull request as draft September 28, 2024 03:52
@rhatdan
Copy link
Member

rhatdan commented Sep 28, 2024

/approve
LGTM

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with this code but why are we copying socket files in the first place? I don't see a how any user could benefit from a not connected socket file in the target.
So would it not be simpler to skip socket files all together?

@anderbubble anderbubble marked this pull request as ready for review September 28, 2024 14:03
@anderbubble
Copy link
Contributor Author

why are we copying socket files in the first place

I can’t speak to this: before I started investigating I expected the solution would be to skip sockets as well. But when I found code that was explicitly trying to copy them, but just failing to do so properly, I decided I would start by making the existing code work, rather than remove it, since I didn’t know the original intent.

in theory, I suppose the existence of the socket could be a sentinel to some other application. I also think the existing socket can be re-opened by later applications at the existing path.

@Luap99
Copy link
Member

Luap99 commented Sep 28, 2024

in theory, I suppose the existence of the socket could be a sentinel to some other application.

Maybe but this seems rather unlikely

I also think the existing socket can be re-opened by later applications at the existing path.

I do not see how this could work, binding an existing socket file will fail with EADDRINUSE even when there is no other program actively listening there. As such most programs do an explicit unlink() before calling bind().

Given the current version deletes the path again (and then fails to chown) it should be safe to assume that no user can depend on the behavior and I personally see skipping sockets to be much simpler than doing this here.

@anderbubble
Copy link
Contributor Author

anderbubble commented Sep 28, 2024

The original socket code was added by @giuseppe in 029896b. No objection from me removing it / altering it to skip sockets in stead; but maybe he has insight into why it was added in the first place.

Edit: I guess that's not quite right: he just converted it from mkfifo. I'll investigate further.

@anderbubble
Copy link
Contributor Author

Looks like the first commit at 6a9fa6b includes sockets and was authored by @nalind, so he might have insight into the original intent as well.

@giuseppe
Copy link
Member

I think this was done without thinking too much about its implications, so let's ignore them. Should we do the same with FIFOs?

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 30, 2024

In Docker, this exists all the way back in moby/moby#7619 .


Note the existence of

func (gdw *NaiveDiffDriver) Diff(id string, idMappings *idtools.IDMappings, parent string, parentMappings *idtools.IDMappings, mountLabel string) (arch io.ReadCloser, err error) {
; just the presence of files might actually make a difference to the reported layer contents.

I don’t know that the files would be useful, but if a pull+push changed the filesystem contents, that might be unexpected.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Let’s decide whether we need these files at all, first.)

@Luap99
Copy link
Member

Luap99 commented Sep 30, 2024

I did take a quick look what gnu cp does. With cp -r it will copy the socket file, however it does not the socket(), bind(), close() thing we are doing here.
Instead it just calls mknodat() looking at the strace output
mknodat(AT_FDCWD, "/tmp/test2/mysock", S_IFSOCK|0700) = 0

relevant source code: https://github.com/coreutils/coreutils/blob/e71b24cedc2ff640f94911298e53b24506f13062/src/copy.c#L3090-L3099

Thus I would say we should prefer a mknod() call here over the current fix.


Though this still doesn't answer if we have to keep socket files during copy.

I don’t know that the files would be useful, but if a pull+push changed the filesystem contents, that might be unexpected.

I guess if this is a practical concern then keeping them makes sense.

@giuseppe
Copy link
Member

I don’t know that the files would be useful, but if a pull+push changed the filesystem contents, that might be unexpected.

I guess if this is a practical concern then keeping them makes sense.

AFAICS a tarball cannot handle unix sockets (https://pkg.go.dev/archive/tar#pkg-constants), so that shouldn't be a concern.

I did take a quick look what gnu cp does. With cp -r it will copy the socket file, however it does not the socket(), bind(), close() thing we are doing here. Instead it just calls mknodat() looking at the strace output mknodat(AT_FDCWD, "/tmp/test2/mysock", S_IFSOCK|0700) = 0

Using mknod seems like a good idea to fix the current code without changing its behavior

@mtrmac
Copy link
Collaborator

mtrmac commented Sep 30, 2024

AFAICS a tarball cannot handle unix sockets (https://pkg.go.dev/archive/tar#pkg-constants)

Thanks, TIL.

@rhatdan
Copy link
Member

rhatdan commented Oct 8, 2024

@anderbubble what is going on with this PR?

@anderbubble
Copy link
Contributor Author

@rhatdan I'll update it based on the conversation here; I just got pulled into some other things for a bit.

@anderbubble anderbubble force-pushed the dircopy-leave-socket branch 2 times, most recently from dc8c8bf to fbf5ab0 Compare October 30, 2024 17:48
- Closes containers#2113

`copy.DirCopy` contains code to create named sockets matching those on
the source; however, this operation closes the socket, removing it
implicitly from the destination and causing later chown and chmod
operations to fail.

This commit creates matching file system nodes directly without opening
a socket or invoking the auto-removal semantics of a listener.

Signed-off-by: Jonathon Anderson <[email protected]>
@anderbubble
Copy link
Contributor Author

@rhatdan I updated the pr to use mknod rather than opening a dangling socket.

I think it's good to go now; but let me know if you have any questions or concerns. Sorry for the delay!

@rhatdan
Copy link
Member

rhatdan commented Oct 30, 2024

@giuseppe @Luap99 @mtrmac PTAL

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Nov 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anderbubble, giuseppe, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhatdan
Copy link
Member

rhatdan commented Nov 4, 2024

/lgtm
Thanks @anderbubble

@openshift-ci openshift-ci bot added the lgtm label Nov 4, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 960e8dd into containers:main Nov 4, 2024
20 checks passed
@anderbubble anderbubble deleted the dircopy-leave-socket branch November 4, 2024 16:42
@anderbubble
Copy link
Contributor Author

Will this be automatically backported to 1.55, or do I need to file a separate PR for that?

@giuseppe
Copy link
Member

giuseppe commented Nov 4, 2024

it needs to be backported

anderbubble pushed a commit to anderbubble/storage that referenced this pull request Nov 4, 2024
Update copy.DirCopy to leave sockets in the file system

(cherry picked from commit 960e8dd)
@anderbubble
Copy link
Contributor Author

Backport PR at #2159. Let me know if that needs any adjustment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copy.DirCopy fails when encountering sockets
5 participants