Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unable to start or daemonize firecracker vms with jailer #4140

Closed
2 of 3 tasks
fffe opened this issue Sep 28, 2023 · 8 comments
Closed
2 of 3 tasks

[Bug] Unable to start or daemonize firecracker vms with jailer #4140

fffe opened this issue Sep 28, 2023 · 8 comments
Labels
Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` Status: Awaiting author Indicates that an issue or pull request requires author action Type: Documentation Indicates a need for improvements or additions to documentation

Comments

@fffe
Copy link

fffe commented Sep 28, 2023

Describe the bug

I have been following along with the getting started guide, but I am unable to start firecracker vms using jailer.

When I try to start a vm via jailer in the foreground, I get this error:

2023-09-28T22:16:30.020581822 [firetest:main:WARN:src/vmm/src/builder.rs:172] Cannot set raw mode for the terminal. Error(5)
2023-09-28T22:16:30.020678165 [firetest:main:WARN:src/vmm/src/builder.rs:184] Cannot set canonical mode for the terminal. Error(5)
2023-09-28T22:16:30.020683208 [firetest:main:WARN:src/vmm/src/lib.rs:915] Error thrown by observer object on Vmm teardown: I/O error (os error 5)
2023-09-28T22:16:30.084154267 [firetest:main:ERROR:src/firecracker/src/main.rs:503] Building VMM configured from cmdline json failed: Internal(VcpuStart(VmmObserverInit(Error(5))))

If I try to daemonize jailer, I get this error instead:

thread 'main' panicked at 'Jailer error: Failed to daemonize: setsid: Operation not permitted (os error 1)', src/jailer/src/main.rs:376:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

The same config file boots as expected when firecracker is run directly.

To Reproduce

I followed these commands to get started:

ARCH="$(uname -m)"
TAP_DEV="tap0"
TAP_IP="192.168.0.1"
MASK_SHORT="/30"
NAME="firetest"

# get latest release of firecracker/jailer
release_url="https://github.com/firecracker-microvm/firecracker/releases"
latest=$(basename $(curl -fsSLI -o /dev/null -w  %{url_effective} ${release_url}/latest))
curl -L ${release_url}/download/${latest}/firecracker-${latest}-${ARCH}.tgz | tar -xz
mv release-${latest}-$(uname -m)/firecracker-${latest}-${ARCH} /usr/local/bin/firecracker
mv release-${latest}-$(uname -m)/jailer-${latest}-${ARCH} /usr/local/bin/jailer

# grab official images
curl -OJL https://s3.amazonaws.com/spec.ccfc.min/firecracker-ci/v1.5/${ARCH}/vmlinux-5.10.186
curl -OJL https://s3.amazonaws.com/spec.ccfc.min/firecracker-ci/v1.5/${ARCH}/ubuntu-22.04.ext4
curl -OJL https://s3.amazonaws.com/spec.ccfc.min/firecracker-ci/v1.5/${ARCH}/ubuntu-22.04.id_rsa
chmod 400 ./ubuntu-22.04.id_rsa

# create tap device in a netns
ip netns del "$NAME" 2> /dev/null || true
ip netns add "$NAME"
ip netns exec "$NAME" ip tuntap add dev "$TAP_DEV" mode tap
ip netns exec "$NAME" ip addr add "${TAP_IP}${MASK_SHORT}" dev "$TAP_DEV"
ip netns exec "$NAME" ip link set dev "$TAP_DEV" up

My config.json looks like this:

{
  "boot-source": {
    "kernel_image_path": "vmlinux-5.10.186",
    "boot_args": "console=ttyS0 reboot=k panic=1 pci=off",
    "initrd_path": null
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "path_on_host": "ubuntu-22.04.ext4",
      "is_root_device": true,
      "partuuid": null,
      "is_read_only": true,
      "cache_type": "Unsafe",
      "io_engine": "Sync",
      "rate_limiter": null
    }
  ],
  "machine-config": {
    "vcpu_count": 2,
    "mem_size_mib": 1024,
    "track_dirty_pages": false
  },
  "balloon": null,
  "network-interfaces": [
    {
      "iface_id": "1",
      "host_dev_name": "tap0",
      "guest_mac": "06:00:c0:a8:00:02",
      "rx_rate_limiter": null,
      "tx_rate_limiter": null
    }
  ],
  "vsock": null,
  "logger": null,
  "metrics": null,
  "mmds-config": null,
  "entropy": null
}

I then did a bit of basic setup in the chroot environment -- copying in the config file + kernel + rootfs -- and tried to launch the vm via jailer:

rm -rf /srv/jailer/firecracker
mkdir -p /srv/jailer/firecracker/$NAME/root ; cp ubuntu-22.04.ext4 vmlinux-5.10.186 config.json /srv/jailer/firecracker/$NAME/root
jailer --id $NAME --exec-file /usr/local/bin/firecracker --uid $(id -u firecracker1) --gid $(id -g firecracker1) --netns "/var/run/netns/$NAME" --new-pid-ns -- --config-file "config.json"
2023-09-28T22:16:30.020581822 [firetest:main:WARN:src/vmm/src/builder.rs:172] Cannot set raw mode for the terminal. Error(5)
2023-09-28T22:16:30.020678165 [firetest:main:WARN:src/vmm/src/builder.rs:184] Cannot set canonical mode for the terminal. Error(5)
2023-09-28T22:16:30.020683208 [firetest:main:WARN:src/vmm/src/lib.rs:915] Error thrown by observer object on Vmm teardown: I/O error (os error 5)
2023-09-28T22:16:30.084154267 [firetest:main:ERROR:src/firecracker/src/main.rs:503] Building VMM configured from cmdline json failed: Internal(VcpuStart(VmmObserverInit(Error(5))))

I haven't found anything about this on Google or in the issue tracker, and I have no idea why it might fail.

I tried to daemonize it in the hopes that it wouldn't bother with the terminal at all, but that gave me a different error:

rm -rf /srv/jailer/firecracker
mkdir -p /srv/jailer/firecracker/$NAME/root ; cp ubuntu-22.04.ext4 vmlinux-5.10.186 config.json /srv/jailer/firecracker/$NAME/root
jailer --id $NAME --exec-file /usr/local/bin/firecracker --uid $(id -u firecracker1) --gid $(id -g firecracker1) --netns "/var/run/netns/$NAME" --new-pid-ns --daemonize -- --config-file "config.json"
thread 'main' panicked at 'Jailer error: Failed to daemonize: setsid: Operation not permitted (os error 1)', src/jailer/src/main.rs:376:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

setsid() can fail if the process is already a leader, but otherwise I have no idea why this isn't working. As before, no hits on Google or in the issue tracker that I saw.

If I run the same config file directly via firecracker, it boots successfully:

firecracker --id "$NAME" --config-file "config.json"

It will complain because the tap device isn't setup properly (it doesn't know about the netns), but the vm boots and the cli is usable.

Expected behaviour

I expect firecracker vms started via jailer to boot.

Environment

  • Firecracker version
    Firecracker v1.4.1
  • Host and guest kernel versions
    Host: 5.15.0-84.93 Ubuntu kernel (current for Ubuntu 22.04)
    Guest: vmlinux-5.10.186 (current image provided by getting started guide)
  • Rootfs used
    ubuntu-22.04.ext4 (current image provided by getting started guide)
  • Architecture
    x86-64

Additional context

It appears to be caused by the --new-pid-ns option.

Checks

  • Have you searched the Firecracker Issues database for similar problems?
  • Have you read the existing relevant Firecracker documentation?
  • Are you certain the bug being reported is a Firecracker issue?
@pb8o
Copy link
Contributor

pb8o commented Sep 29, 2023

Hi, for the 1st error, can you remove this bit? console=ttyS0 and try.

@fffe
Copy link
Author

fffe commented Sep 29, 2023

Hi, for the 1st error, can you remove this bit? console=ttyS0 and try.

No change, same error.

@fffe
Copy link
Author

fffe commented Sep 29, 2023

Thinking about this a bit more... that might explain why --daemonize doesn't work?

From the man page: setsid() fails with EPERM if "[t]he process group ID of any process equals the PID of the calling process. Thus, in particular, setsid() fails if the calling process is already a process group leader."

I'm a little fuzzy on pid namespaces, but if the firecracker child is in a new pid namespace (and thus has no reference to its parent) then presumably its process group id matches its pid, which would cause setsid() to fail with EPERM. If that's the case, perhaps that should treated as a non-fatal error?

It still doesn't explain why setting terminal options would fail.

@wearyzen wearyzen added Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` Type: Documentation Indicates a need for improvements or additions to documentation labels Oct 2, 2023
@wearyzen
Copy link
Contributor

wearyzen commented Oct 2, 2023

Hi @fffe, I was able to get jailer working with you steps and below command (removed --new-pid-ns):

/usr/local/bin/jailer --id $NAME --exec-file /usr/local/bin/firecracker --uid $(id -u firecracker1) --gid $(id -g firecracker1) --netns "/var/run/netns/$NAME" -- --config-file "config.json"

Could you please confirm if this works for you?
Meanwhile we will try to find a good example to start Jailer with daemonize option.

@fffe
Copy link
Author

fffe commented Oct 2, 2023

Sorry, I should have been clearer that I'd tried that (and it worked) when I flagged --new-pid-ns as the problem in the bug report.

I don't plan to daemonize these, I was just trying to dodge the terminal setup issue. I will likely run them in the foreground via systemd. It would be nice if pid namespacing worked there.

With that said, I think it's incorrect to classify this as a documentation bug, unless the bug is that --new-pid-ns is never expected to work?

Reading the jailer docs again, they actually warned me about this (oops):

When running with --daemonize, the jailer will fail to start if it's a process group leader, because setsid() returns an error in this case. Spawning the jailer via clone() and exec() also ensures it cannot be a process group leader.

... but that just seems like a bug in itself. Since setsid() will always fail in that case, and --daemonize does more than just call setsid(), it seems like the better way to handle that is to just not call (or provide an option to not call) setsid() when launched in a pid namespace?

It still makes no sense to me why this appears to be causing the terminal issue as well.

@wearyzen
Copy link
Contributor

Hi @fffe ,

Sorry for the delay in getting back on this issue and also for missing the jailer documentation earlier.
Your observation is right about the setsid error and to improve user experience on this we decided to merge #4259.
Could you please try the latest main and let us know if you still see the setsid issue?

@roypat roypat added the Status: Awaiting author Indicates that an issue or pull request requires author action label Dec 4, 2023
@kalyazin
Copy link
Contributor

Hi @fffe ,

Just to let you know, we believe that we have fixed the issue and there is no pending action on our side. Please let us know if you still observe the failure, otherwise we're planning to close the issue in a couple of weeks.

@wearyzen
Copy link
Contributor

Hi @fffe ,
We believe that the issue is fixed so we are closing this issue. Please feel free to reopen it or reach us out if you have any questions or updates.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Indicates that an issue or pull request should be resolved behind issues or pull requests labelled ` Status: Awaiting author Indicates that an issue or pull request requires author action Type: Documentation Indicates a need for improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

5 participants