Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel-install: integration #5135

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Conversation

jmarrero
Copy link
Member

@jmarrero jmarrero commented Oct 22, 2024

incorporates #5097
closes: #4726

This will pair with: https://gitlab.com/fedora/bootc/base-images/-/merge_requests/62 which makes the integration work.

This builds and works on the following Containerfile.

Need to figure out how to properly deal with the commented out code still.

FROM $image
RUN curl -O -L -k https://kojipkgs.fedoraproject.org//packages/kernel/6.12.0/0.rc1.20241005git27cc6fdf7201.22.fc42/x86_64/kernel-6.12.0-0.rc1.20241005git27cc6fdf7201.22.fc42.x86_64.rpm && \
 curl -O -L -k https://kojipkgs.fedoraproject.org//packages/kernel/6.12.0/0.rc1.20241005git27cc6fdf7201.22.fc42/x86_64/kernel-core-6.12.0-0.rc1.20241005git27cc6fdf7201.22.fc42.x86_64.rpm && \
 curl -O -L -k https://kojipkgs.fedoraproject.org//packages/kernel/6.12.0/0.rc1.20241005git27cc6fdf7201.22.fc42/x86_64/kernel-modules-6.12.0-0.rc1.20241005git27cc6fdf7201.22.fc42.x86_64.rpm && \
 curl -O -L -k https://kojipkgs.fedoraproject.org//packages/kernel/6.12.0/0.rc1.20241005git27cc6fdf7201.22.fc42/x86_64/kernel-modules-core-6.12.0-0.rc1.20241005git27cc6fdf7201.22.fc42.x86_64.rpm && \
 curl -O -L -k https://kojipkgs.fedoraproject.org//packages/kernel/6.12.0/0.rc1.20241005git27cc6fdf7201.22.fc42/x86_64/kernel-modules-extra-6.12.0-0.rc1.20241005git27cc6fdf7201.22.fc42.x86_64.rpm

ADD target/debug/rpm-ostree /usr/bin/rpm-ostree
RUN dnf install -y /kernel*

Copy link

openshift-ci bot commented Oct 22, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jmarrero jmarrero changed the title Override replace kernel kernel-install: integration Oct 22, 2024
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

@jmarrero jmarrero force-pushed the override-replace-kernel branch 2 times, most recently from 74d2f7e to 04dcac2 Compare October 25, 2024 12:57
@jmarrero
Copy link
Member Author

todo here:

  • make sure fips works
  • make sure that layering on a booted environment still works

@cgwalters
Copy link
Member

Joseph and I were looking at #4950 - I'd forgotten about that change when looking at this one.

This is all quite messy because we need to handle 2 different cases with the updated rpm-ostree (i.e. after this PR).

  • Existing base images (e.g. fcos, fedora-bootc) without the change to add integration with kernel-install; rpm-ostree override replace kernel* must continue to work
  • Base images that do have it enabled (and it's only in this case that dnf upgrade kernel will work, what we're trying to fix here)

In the first case, we must just go into our existing kernel handling and not assume the real kernel-install works. In the second, we could do two different things:

  • Detect that the base image has layout=ostree and don't wrap kernel-install
  • Or just use our own integration

I would lean to unconditionally doing the first...we shouldn't try to have it two different ways in the second. Basically let's test stuff well, and then pull the trigger on the base image side and always go through the real kernel-install.

@jmarrero jmarrero force-pushed the override-replace-kernel branch from 04dcac2 to 3a4f740 Compare December 10, 2024 20:43
@jmarrero jmarrero force-pushed the override-replace-kernel branch 2 times, most recently from 52ec043 to ca3d005 Compare December 20, 2024 17:37
@jmarrero
Copy link
Member Author

jmarrero commented Dec 20, 2024

+ rpm-ostree override replace 'https://koji.fedoraproject.org/koji/buildinfo?buildID=2571615'
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.11.4/301.fc41/x86_64/kernel-6.11.4-301.fc41.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.11.4/301.fc41/x86_64/kernel-modules-core-6.11.4-301.fc41.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.11.4/301.fc41/x86_64/kernel-core-6.11.4-301.fc41.x86_64.rpm...done
Downloading https://kojipkgs.fedoraproject.org/packages/kernel/6.11.4/301.fc41/x86_64/kernel-modules-6.11.4-301.fc41.x86_64.rpm...done
Enabled rpm-md repositories: libtest updates fedora-cisco-openh264 fedora updates-archive
Importing rpm-md...done
rpm-md repo 'libtest' (cached); generated: 2024-12-20T17:38:23Z solvables: 32
rpm-md repo 'updates' (cached); generated: 2024-12-20T13:31:04Z solvables: 14309
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2024-03-11T19:22:31Z solvables: 3
rpm-md repo 'fedora' (cached); generated: 2024-10-24T13:55:59Z solvables: 76624
rpm-md repo 'updates-archive' (cached); generated: 2024-12-20T13:56:56Z solvables: 178[81](https://github.com/coreos/rpm-ostree/actions/runs/12435892073/job/34722979374?pr=5135#step:8:82)
Resolving dependencies...done
Installing 4 packages:
  kernel-6.11.4-301.fc41.x86_64 (@commandline)
  kernel-core-6.11.4-301.fc41.x86_64 (@commandline)
  kernel-modules-6.11.4-301.fc41.x[86](https://github.com/coreos/rpm-ostree/actions/runs/12435892073/job/34722979374?pr=5135#step:8:87)_64 (@commandline)
  kernel-modules-core-6.11.4-301.fc41.x86_64 (@commandline)
Downgrading: kernel-modules-core;6.11.4-301.fc41;x86_64;local
Downgrading: kernel-core;6.11.4-301.fc41;x86_64;local
Downgrading: kernel-modules;6.11.4-301.fc41;x86_64;local
Downgrading: kernel;6.11.4-301.fc41;x86_64;local
Cleanup: kernel;6.12.4-200.fc41;x86_64;installed
Cleanup: kernel-modules;6.12.4-200.fc41;x86_64;installed
Cleanup: kernel-modules-core;6.12.4-200.fc41;x86_64;installed
Cleanup: kernel-core;6.12.4-200.fc41;x86_64;installed
grub2-probe: error: failed to get canonical path of `overlay'.
No path or device is specified.
Usage: grub2-probe [OPTION...] [OPTION]... [PATH|DEVICE]
Try 'grub2-probe --help' or 'grub2-probe --usage' for more information.
dracut[E]: No '/dev/log' or 'logger' included for syslog logging
dracut-install: ERROR: installing '/root'
dracut[E]: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.PaxRZT/initramfs -f /root
+ test -f /usr/lib/modules/6.11.4-301.fc41.x86_64/initramfs.img
Error: Process completed with exit code 1.

It looks like I need to add a conditional to the scripts module too.

@cgwalters
Copy link
Member

That output looks like we're missing the hostonly=no for dracut, i.e. part of the generic dracut configs.

@jmarrero jmarrero force-pushed the override-replace-kernel branch 2 times, most recently from 9857c0f to a365259 Compare January 7, 2025 21:45
@jmarrero jmarrero marked this pull request as ready for review January 7, 2025 21:45
@jmarrero jmarrero force-pushed the override-replace-kernel branch from a365259 to 2a30ea2 Compare January 7, 2025 21:46
@jmarrero
Copy link
Member Author

jmarrero commented Jan 7, 2025

OK tested with coreos which does not have layout changes and with a build of bootc image with my MR with the layout=ostree changes and this current iteration works on both images, generates the initramfs and removes the old kernel.

@jmarrero
Copy link
Member Author

jmarrero commented Jan 7, 2025

That output looks like we're missing the hostonly=no for dracut, i.e. part of the generic dracut configs.

I had messed up the core.rs changes by checking for OSTREE_BOOTED on top of the layout=ostree presence. This made the changes Jonathan did to not require users to set the cliwrap not work at all.

@cgwalters cgwalters self-assigned this Jan 8, 2025
@jmarrero jmarrero force-pushed the override-replace-kernel branch 3 times, most recently from c232c3b to a560bfa Compare January 8, 2025 15:02
pub fn is_ostree_layout() -> Result<bool> {
let install_conf = Path::new(KERNEL_INSTALL_CONF);
if !install_conf.is_file() {
println!("can not read /usr/lib/kernel/install.conf");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as a tracing::debug

@@ -30,6 +32,9 @@ static WRAPPED_BINARIES: &[&str] = &["usr/bin/rpm", "usr/bin/dracut", "usr/sbin/
/// Binaries we will wrap, or create if they don't exist.
static MUSTWRAP_BINARIES: &[&str] = &["usr/bin/yum", "usr/bin/dnf"];

/// Binaries we will wrap only if ostree layout is not specified.
static NON_OSTREE_LAYOUT_WRAP_BINARIES: &[&str] = &["usr/bin/kernel-install"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this but it seems unlikely we ever add more to this array too; we could just reference const KERNEL_INSTALL: &str = "usr/bin/kernel-install"

@cgwalters
Copy link
Member

We were chatting about testing this with kernel-rt and crafted this hacky invocation:
(echo 'remove kernel kernel-modules-core'; echo 'install kernel-rt-modules-extra kernel-rt-modules-kvm'; echo run) | dnf -y shell

@travier
Copy link
Member

travier commented Jan 20, 2025

If I understand correctly, I should now close https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2743 ?

@cgwalters
Copy link
Member

If I understand correctly, I should now close https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2743 ?

Yep

@jmarrero
Copy link
Member Author

jmarrero commented Jan 23, 2025

This introduced a bug where at compose time we will always wrap kernel-install.
@cgwalters did a fix that looks like the solution here: #5241

Since it was at compose time the testing I did layering the new rpm-ostree before installing a kernel did not catch this.

Will double check this fixes the issue and do a new release shortly.

@cgwalters
Copy link
Member

Also fallout in FCOS build which is probably the same but we should investigate
https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/job/build/3314/
ext.config.rpm-ostree.kernel-replace is failing with

16:20:04  Jan 23 16:19:41 qemu0 kola-runext-kernel-replace[1669]: Cleanup: kernel-modules-core;6.13.0-0.rc7.20250114gitc45323b7560e.56.fc42;x86_64;installed
16:20:04  Jan 23 16:19:42 qemu0 kola-runext-kernel-replace[1669]: error: Unknown wrapped binary: kernel-install
16:20:04  Jan 23 16:19:48 qemu0 kola-runext-kernel-replace[1669]: error: Unknown wrapped binary: kernel-install

Same symptoms.

@dustymabe
Copy link
Member

Right. Just noting here that the rpm-ostree override replace kernel*rpm is happening inside a derived container build:

https://github.com/coreos/fedora-coreos-config/blob/76d7d58ab2aa4b3af3e7acb93d725b9ec4a3eef7/tests/kola/rpm-ostree/kernel-replace#L118-L122

@dustymabe
Copy link
Member

It's worth noting that the rawhide run Tuesday succeeded and the one today failed, but rpm-ostree wasn't upgraded.

Also worth noting that since Monday COSA has had rpm-ostree-2025.1-1.fc41 and that hasn't changed in the last day.

@cgwalters
Copy link
Member

Hmm in that tuesday run though I don't see the kernel-replace test run for some reason? Looking at https://jenkins-fedora-coreos-pipeline.apps.ocp.fedoraproject.org/blue/organizations/jenkins/build/detail/build/3308/pipeline/292

@dustymabe
Copy link
Member

dustymabe commented Jan 23, 2025

It's in there in the reprovision tests (we grouped it that way so it runs serially since that once has a lot of disk I/O slash CPU usage):

[2025-01-21T18:26:35.259Z] --- PASS: ext.config.rpm-ostree.kernel-replace (442.28s)

@cgwalters
Copy link
Member

This has definitely leaked into fcos rawhide prod builds:

$ podman run --rm -ti quay.io/fedora/fedora-coreos:stable bash -c 'ls -al /usr/bin/kernel-install*'
-rwxr-xr-x. 2 root root 61872 Jan  1  1970 /usr/bin/kernel-install
$ podman run --rm -ti quay.io/fedora/fedora-coreos:rawhide bash -c 'ls -al /usr/bin/kernel-install*'
-rwxr-xr-x. 2 root root   500 Jan  1  1970 /usr/bin/kernel-install
-rwxr-xr-x. 2 root root 61728 Jan  1  1970 /usr/bin/kernel-install.rpmostreesave
$

And yeah rpm-ostree-2025.1 is definitely in cosa, so it's going to leak into all builds

@cgwalters
Copy link
Member

To explain a bit I think the problem is that the first time we leak the wrapper it's not going to fail. It's only going to fail the next build

@dustymabe
Copy link
Member

the first time we leak the wrapper it's not going to fail. It's only going to fail the next build

I don't fully grok why that is true, but that would explain why the Tuesday rawhide was good while today's isn't.

@cgwalters
Copy link
Member

@dustymabe mind sanity checking for both of us that pulling in the latest rpm-ostree (e.g. from copr) fixes this for you i.e. the target system doesn't have wrapped kernel-install?

@dustymabe
Copy link
Member

I can tomorrow, but I really need to understand the problem better so I can properly test it. i.e. the first build success/second build fails thing I'll have to dig into.

Should I put the new rpm-ostree in COSA? Can you link me to the copr repo to pull from?

@cgwalters
Copy link
Member

cgwalters commented Jan 23, 2025

Should I put the new rpm-ostree in COSA?

Yeah

Can you link me to the copr repo to pull from?

https://copr.fedorainfracloud.org/coprs/g/CoreOS/continuous/

@jmarrero jmarrero mentioned this pull request Jan 23, 2025
@dustymabe
Copy link
Member

I tested rpm-ostree-2025.2-1.fc41 in a local COSA and the rpm-ostree override replace of the kernel seems to be working now.

@cgwalters
Copy link
Member

Thanks! The extra reassurance is helpful because Joseph wasn't seeing the fix work; we were pretty sure we traced that to an incorrect caching, but this one is so subtle that extra sanity checking is helpful.

@dustymabe
Copy link
Member

dustymabe commented Jan 24, 2025

TBH I'm still not completely sure I understand all the pieces to the puzzle here.

One really confusing part is that f42 bodhi tests have been passing the last day or two and these should be using all the same tools (i.e. latest COSA), but just freezing on:

  1. the most recent successful rawhide build's set of packages
  2. plus the RPMs from the bodhi update in question

I just don't understand how those are passing, so the problem is somehow more nuanced and I think we need to get to the bottom of it so that we can fully ensure that our prod streams (which were built with the "bad rpm-ostree in COSA" on Tuesday) don't see regressions.

@dustymabe
Copy link
Member

Some more info that might or might not be useful.. between a "good" and "bad" build here's the diff:

[dustymabe@media fcos]$ ostree --repo=tmp/repo diff 42.20250124.dev.1 42.20250124.dev.2 
M    /usr/bin/kernel-install
M    /usr/etc/pki/ca-trust/extracted/java/cacerts
M    /usr/lib/os-release
M    /usr/lib/modules/6.13.0-0.rc7.20250114gitc45323b7560e.56.fc42.x86_64/initramfs.img
M    /usr/lib/sysimage/rpm-ostree-base-db/rpmdb.sqlite
M    /usr/share/rpm/rpmdb.sqlite
D    /usr/bin/kernel-install.rpmostreesave

@cgwalters
Copy link
Member

Everything except the kernel-install are files that are unreproducible today. Though after #5244 the rpmdb may drop out...and we could probably fix cacerts with https://github.com/keszybz/add-determinism/ and that would just leave the initramfs which needs some special handling.

@dustymabe
Copy link
Member

Right. I know some files are expected (for now) to change between successive builds. I'm more interested in the ones we aren't expecting to change. Does that info help the investigation?

@jmarrero
Copy link
Member Author

I tested this again this morning with the bootc rawhide image that has the new rpm-ostree and we no longer are always wrapping kernel-install. The change this feature added is a new path for when ostree=layout is set, we won't wrap kernel-install instead rely on kernel-install deferring the logic to rpm-ostree.

@jlebon
Copy link
Member

jlebon commented Jan 24, 2025

It's worth noting that the rawhide run Tuesday succeeded and the one today failed, but rpm-ostree wasn't upgraded.

Also worth noting that since Monday COSA has had rpm-ostree-2025.1-1.fc41 and that hasn't changed in the last day.

@dustymabe @jmarrero and I got together to discuss this, and the reason the Tuesday run passed but not the Wednesday one is that we inherited https://gitlab.com/fedora/bootc/base-images/-/merge_requests/62/diffs in between. Which means that the Tuesday run was using the old path, and the Wednesday run the new path (with the bug).

We also determined that while we did leak wrappers in the latest production releases, it's harmless without the layout=ostree change which is constrained to Rawhide. We sanity-checked that replacing the kernel both in a container flow and client-side still works fine.

@dustymabe
Copy link
Member

It's worth noting that the rawhide run Tuesday succeeded and the one today failed, but rpm-ostree wasn't upgraded.

Also worth noting that since Monday COSA has had rpm-ostree-2025.1-1.fc41 and that hasn't changed in the last day.

Circling back to this. The above statements are true, but what did happen between Tuesday and Thursday was that this change merged into FCOS testing-devel (and was synced to `rawhide) , which affected the behavior.

travier added a commit to travier/fedora-atomic-desktops-devel that referenced this pull request Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate with kernel-install.d
5 participants