Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chrootarchive: Pass root via fd #2049

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

cgwalters
Copy link
Contributor

I'd like to support passing a file descriptor root for the container storage, and not an absolute path.

In the bootc codebase (partially a philosophy inherited from ostree) we've heavily invested in fd-relative accesses, primarily because it's common for us to operate in different namespaces/roots, and fd-relative access avoids a lot of possible footguns when dealing with absolute paths. It's also more efficient, avoiding the need for the kernel to traverse full paths a lot.

This is just one of a few preparatory changes necessary in making it work to do:

podman --root=/proc/self/fd/3 --runroot=... pull busybox

Note that as part of doing this, I refactored code here to move the "split root and dest" logic into the caller... there was some logically dead code here because

@cgwalters
Copy link
Contributor Author

See containers/bootc#731 particularly https://github.com/containers/bootc/pull/724/files#diff-30bb05ef4e35724331b84e50edacf12aaf7eb21e3a5576a341d556accb0d1400R75 which is doing a complex dance right now to bridge from our internal fds to the absolute paths this package currently requires.

@cgwalters
Copy link
Contributor Author

Fixed up the cross-compilation issues.

@cgwalters
Copy link
Contributor Author

Actually...is this macos/windows code ever used?

@rhatdan
Copy link
Member

rhatdan commented Jul 30, 2024

Most likely not.

if relDest[0] != '/' {
relDest = "/" + relDest
}
dest = relDest
Copy link
Collaborator

@mtrmac mtrmac Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t try this and I might well be missing something, but how does this work on non-Linux where root/rootfd are completely ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should continue the charade of supporting containers/storage on non linux/freeBSD machines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember the tar/untar functions being used for non-graph-driver content as well. E.g. Podman’s pkg/machine/ocipull/oci.go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and if we were to drop support for other systems, I think the code would have to be modified to fail to compile, rather than being allowed to write to unexpected destinations by neglect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t try this and I might well be missing something, but how does this work on non-Linux where root/rootfd are completely ignored?

It doesn't.

I vaguely remember the tar/untar functions being used for non-graph-driver content as well. E.g. Podman’s pkg/machine/ocipull/oci.go.

Hmm, good point, but in glancing slightly more deeply it seems like that...ah yes, notice the archive vs chrootarchive, so it's not using this code I think.

Of course, crossing with other threads you know of, I don't see why the podman-machine disk images are actually stored as container images (including being wrapped in a tarball) that we manually unpack vs directly as an OCI artifact which we fetch and stream directly to disk without an extra copy, which would be inherently more secure too. (Of course, if the podman machine archive today is malicious, you're kind of boned anyways because that machine ends up with write access to your home directory)

Copy link
Contributor Author

@cgwalters cgwalters Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on non linux/freeBSD

Of course already needing to abstract over those two is pretty nontrivial...Linux has openat2 but FreeBSD ended up doing things differently with O_RESOLVE_BENEATH etc. The current zstd:chunked code is hardcoded to Linux.

Since it's been at least a whole day since I've mentioned Rust here...in bootc/ostree we've been using https://github.com/bytecodealliance/cap-std/ and it's quite nice and elegant and well designed (I've also been contributing), and has backends for both linux and freebsd. In theory we could fork off a helper process using that to handle untar...it even has a whole "userspace fs resolver fallback" path that is what's used today on MacOS.

Copy link
Collaborator

@mtrmac mtrmac Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember the tar/untar functions being used for non-graph-driver content as well. E.g. Podman’s pkg/machine/ocipull/oci.go.

Hmm, good point, but in glancing slightly more deeply it seems like that...ah yes, notice the archive vs chrootarchive, so it's not using this code I think.

Thanks, that’s an important nuance (while we do need to partially continue to support c/storage on the other systems).


A longer comment in #2051 (comment) , but basically I’m leaning towards the thought that a Linux-only code path is not an improvement.

The untar code is supposed to be secure in any environment; AFAICT the chroot is an additional isolation, not something the callers specifically require because they depend on some chroot-only property.

In theory we could fork off a helper process using that to handle untar...

The way pkg/reexec works here, there are no external helper binaries. In the post-RPM world of people just copying binaries around, that makes deployment easier — probably not for Podman and all its dependencies, but for some of the other lighter-weight callers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why the podman-machine disk images are actually stored as container images

At least for the pkg/machine/compression code, that’s not necessarily the case — it’s a very generic decompressor handling things like ZIP, not compatible with traditional container images anyway.

@cgwalters
Copy link
Contributor Author

cgwalters commented Jul 30, 2024

This triggered a memory for me that we also ship oc image extract...I went to look at what it does, and it forked from moby.

I am not sure if oc image extract is currently load bearing on windows/macos? Not much happening there but it looks like it compiles...

@cgwalters
Copy link
Contributor Author

Tossed up #2051

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just to have a tool-visible marker that the dest = relDest move is breaking non-Linux systems; let’s continue the discussion in the original thread.)

@cgwalters cgwalters marked this pull request as ready for review July 31, 2024 17:53
@cgwalters
Copy link
Contributor Author

OK I reworked this in a way which works on Linux (tested) and FreeBSD (not tested, but in theory), and leaves other platforms unchanged (not tested).

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK to the principle of the thing.

Consider structuring this as a per-platform data type:

type unpackDestination struct { /* platform-specific */ }
func newUnpackDestination(root, dest string) unpackDestination { … }
func (dest unpackDestination) invokeUnpack(decompressedArchive …, options …)

That would hide the platform-specific rootFd in the Linux code, entirely avoid the unused parameter on non-Linux, and make it easier to close the file descriptor in the caller.

(Arguably, when I added the _ = warning silencers, I probably should have done that already; I was too focused on the warnings to actually think about the code. The addition of a file descriptor makes the case even stronger.)

@cgwalters
Copy link
Contributor Author

A note: If we take this out farther, what would make sense is for c/storage to hold a fd (not an absolute path) for its storage root, then this code would need more refactoring as dest wouldn't necessarily be prepended with the root by default. That'd probably be a distinct new API calling into the shared bits, we just wouldn't need to do the "strip off the root" on the Unix paths in that case.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

@cgwalters cgwalters force-pushed the unpack-via-fd branch 2 times, most recently from c28e092 to 692352a Compare July 31, 2024 19:35
@rhatdan
Copy link
Member

rhatdan commented Aug 1, 2024

@giuseppe @nalind PTAL

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit, otherwise LGTM

Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Aug 6, 2024
I'd like to support passing a file descriptor root for the
container storage, and not an absolute path.

In the bootc codebase (partially a philosophy inherited
from ostree) we've heavily invested in fd-relative accesses,
primarily because it's common for us to operate in different
namespaces/roots, and fd-relative access avoids a lot of
possible footguns when dealing with absolute paths. It's
also more efficient, avoiding the need for the kernel to
traverse full paths a lot.

This is just one of a few preparatory changes necessary
in making it work to do:

`podman --root=/proc/self/fd/3 --runroot=... pull busybox`

That was breaking because the fd was being closed when forking
the child untar process here. Fix this by switching over
to always passing the root via fd on Unix.

Signed-off-by: Colin Walters <[email protected]>
Minor unrelated cleanup.

Signed-off-by: Colin Walters <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Aug 12, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Aug 12, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit d1b0a56 into containers:main Aug 12, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants