Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRUB: Extent not found after running bees #249

Open
hotburger opened this issue Feb 16, 2023 · 15 comments
Open

GRUB: Extent not found after running bees #249

hotburger opened this issue Feb 16, 2023 · 15 comments

Comments

@hotburger
Copy link

GRUB gives me this error after running bees for a few hours. It is consistently doing it every time I run bees for enough time. I fix it temporarily by reinstalling the kernel package from chroot. I'm assuming bees is deduping the kernel, which grub doesn't like? Strangely this doesn't happen to my arch install on the same partition.

I assume that gentoo is storing another copy of the kernel somewhere while arch doesn't. The only other difference from my arch install is a separate subvol for /boot.

grub error message:

Loading Linux 6.1.11-gentoo-dist ...
error: extent not found.
Loading initial ramdisk ...
error: you need to load the kernel first.

Press any key to continue...
@Zygo
Copy link
Owner

Zygo commented Feb 18, 2023

Some experiments to try to collect more information:

  1. Run btrfs-search-metadata file /path/to/vmlinuz (from python-btrfs package) before and after the failure (i.e. once after reinstalling, and once again when boot fails).
  2. Does it also fail when making a reflink of the kernel, e.g. cp --reflink=always /path/to/vmlinuz /root/foo and then reboot?

I don't know how grub would distinguish one reflink to a file from another, much less be fatally broken by it, so I expect experiment 2 will not trigger a grub failure, and we'll see some anomalous feature (non-zero extent offsets? unsupported compression type? hole in kernel file?) from experiment 1.

Hopefully we get some information that can be turned into an actionable grub bug report.

@hotburger
Copy link
Author

This issue stopped happening for a while, so I couldn't replicate it to gather info. It is happening again though.
Creating a reflink did not cause the boot to fail.
vmlinuz-6.2.7-broken.log
vmlinuz-6.2.7-fixed.log

@Zygo
Copy link
Owner

Zygo commented Mar 26, 2023

Looks like this is fixed in grub but not released yet:

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=7f4e017a1416bcbdca16de4f923679ec9f003171

@Jorropo
Copy link

Jorropo commented Aug 14, 2023

I had similar boot issues in versions of grub that supposedly have this fixed (it would panic in various random ways), I switched to a 3 partition layout with:

  • / btrfs
  • /boot ext4
  • /boot/efi vfat

Which works around the problems.

Seems like grub's btrfs implementation is not very good yet.

@Trayshar
Copy link

I have the same problem on manjaro, using kernel 6.6.8-2-MANJARO and grub 2.12. Before entering the grub menu I get this error:

error: start_image() returned 0x800000000000000001.

Failed to boot both default and fallback entries.

Press any key to continue...

I can get into the grub menu after that, but trying to boot results in error: you need to load the kernel first. and the system freezes...

I am now successfully using @Jorropo's workaround

@PfannenHans
Copy link

I can confirm this on two separate machines running Arch. Here it is usually the amd-ucode.img that gets broken and gives the error: premature end of file. The systems boot if i remove it from the boot entry in Grub.
Chrooting into the installation and reinstalling the ucode also fixes it temporarily.

@kakra
Copy link
Contributor

kakra commented Feb 24, 2024

You can set the boot directory chattr +C before reinstalling the boot loader and see if that helps. bees won't touch file extents created with this flag on, IOW, setting the flag on already existing files changes nothing. New files will inherit the flag from the directory. But this also removes checksum protection from your boot files, so it can only work as a temporary work-around.

@Zygo
Copy link
Owner

Zygo commented Jan 20, 2025

I guess I can file this under "use cases for path-based exclusion rules"... 😝

@felixonmars
Copy link

I hit this as well. The default initrd touched by bees appears to be not present on boot, although GRUB didn't report an error here. The fallback initrd works though, but regenerating initrds didn't make a difference.

Image

Stopping bees makes the problem go away immediately.

@Zygo
Copy link
Owner

Zygo commented Feb 9, 2025

grub 2.12-rc1 contains the earlier btrfs dedupe fixes from 2021, but the tag was released in July 2023. This hasn't made its way to some distros yet. There is also a missing fix "3c7e84257 fs/btrfs: Zero file data not backed by extents" which could be triggered by dedupe.

This is likely to be a problem for a few more years while grub fixes make their way through distro release cycles. Please indicate the grub version and whether it works or not.

@felixonmars
Copy link

I'm using Arch's grub package at grub-2.12, sadly I was still able to reproduce it.

I also found the related logs from bees, hopefully it's useful in some way.

Feb 09 10:29:47 <hostname> beesd[521]: crawl_5_593084346[558]: zero bbd BeesBlockData { 4K 0x1b2000 fd = 2288 '/run/bees/mnt/<uuid>/boot/initramfs-linux-lts.img', address = 0x2db947000, hash = 0x0, data[4096] }
Feb 09 10:29:47 <hostname> beesd[521]: crawl_5_593084346[558]:         in extent Extent { begin = 0x0, end = 0x937792, physical = 0x2db795000, flags = FIEMAP_EXTENT_LAST, physical_len = 0x938000, logical_len = 0x938000 }
Feb 09 10:29:47 <hostname> beesd[521]: crawl_5_593084346[558]: copy: 9.217M (..0x937792) fd = 2288 '/run/bees/mnt/<uuid>/boot/initramfs-linux-lts.img'
...
Feb 09 10:29:48 <hostname> beesd[521]: crawl_5_593084346[558]: dedup: src 9.217M [0x1000..0x938792] {HOLE} /run/bees/mnt/<uuid>/#514487809 (deleted)
Feb 09 10:29:48 <hostname> beesd[521]: crawl_5_593084346[558]:        dst 9.217M [0x0..0x937792] {0x2db795000} /run/bees/mnt/<uuid>/boot/initramfs-linux-lts.img

@Zygo
Copy link
Owner

Zygo commented Feb 9, 2025

dedup: src 9.217M [0x1000..0x938792] {HOLE}

That looks like exactly the setup for the fix in "3c7e84257 fs/btrfs: Zero file data not backed by extents"...the first page is a hole.

@felixonmars
Copy link

I just double checked that the upstream tag 2.12 has already contained this commit. Weird :/

@Zygo
Copy link
Owner

Zygo commented Feb 10, 2025

Ah, OK, I see my mistake...I checked 2.12-rc1, not 2.12. There's nothing in upstream grub git master that mentions btrfs and is not already in 2.12.

So...there's still some bugs left in grub's btrfs support.

@XZVB12
Copy link

XZVB12 commented Feb 10, 2025

Considering that the grub2 project has not developed much in the last few years. These fix, like many others, will probably not be accepted in the near future. Probably the only way to solve this problem is to completely exclude /boot from dedup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants