Recovering from /boot weirdness

If you suddenly find your system booting an older kernel, even though you know a new one’s available, and you’ve even used clr-boot-manager to select it explicitly, there’s a chance you’ve run into a very specific failure scenario.

Background

  • Clear Linux uses UEFI to boot.
  • UEFI looks for an EFI Service Partition (ESP) on your hard drive to load the bootloader and kernel from
  • The ESP is usually VFAT-formatted
  • We keep the partition unmounted after boot because it’s easy to corrupt
  • Our tools (e.g. clr-boot-manager) that need to modify the contents of the ESP mount it on demand
  • We call boot.mount to mount the ESP
  • boot.mount is provided by systemd-gpt-auto-generator

Where things go wrong

Normally, systemd generates boot.mount without issue. Sometimes it doesn’t, though. The most common failure case is when something has placed files in the /boot directory on your root device.

…where the /boot/ (or /efi/ ) mount point is non-empty, no mount units are generated…

Subsequently, boot.mount does not exist, and clr-boot-manager cannot mount the ESP to write the new kernel and/or bootloader configuration, but still writes the changes to the /boot directory.

When this happens, you’ll see one set of contents (that looks correct) in /boot, but it’s actually just the directory on your root device. Meanwhile, at boot, UEFI still uses the (untouched) ESP.

Diagnosing the issue

First, check whether anything is actually mounted on /boot

$ mountpoint /boot
/boot is not a mountpoint

If /boot is not a mountpoint and you have files there, then boot.mount will not be auto-generated, and you will have problems.
If /boot is mounted, try unmounting it and check again:

$ sudo umount /boot
$ mountpoint /boot
/boot is not a mountpoint
$ ls -la /boot

If you still have contents in /boot, then you will run into this issue.

Fixing it

Copy the contents of /boot to another directory just in case. You shouldn’t need these later, but at least you can restore with a boot disk.
Find the device corresponding to your ESP and mount it manually:

$ lsblk -o NAME,PARTLABEL
NAME   PARTLABEL
sda
├─sda1 EFI
├─sda2 linux-swap
└─sda3 /

In this example, sda1 is the ESP (partlabel EFI), so:

$ sudo mount /dev/sda1 /boot

Now you should see the expected files. clr-boot-manager should be able to update from here, so run:

$ sudo clr-boot-manager update

Check that /boot looks like you expect, then reboot.

3 Likes

Should the repair guidelines also include mounting /boot in step 2 of the OS recovery directions? https://docs.01.org/clearlinux/latest/guides/maintenance/fix-broken-install.html

I believe only mounting the / filesystem and not /boot when the swupd repair is run can create the issue that is solved by this post. Thoughts?

Could you just move the files in /boot that are already there to empty it out and then run sudo clr-boot-manager update to fix this issue? If so that seems simpler.

Hmmm. It depends whether swupd repair calls clr-boot-manager. If so, then yes, the instructions should include mounting /boot under /mnt.

Not in this case, because having files in /boot at boot means that systemd won’t generate boot.mount, so clr-boot-manager won’t automatically mount the ESP at /boot.

Given that would moving the files from /boot, rebooting, and then running sudo clr-boot-manager update work? When I ran the lsblk -o NAME,PARTLABEL command I didn’t see an EFI partlabel but saw only primary as the name for all three partitions. It is a little confusing to know which one is correct. I assume it is the first one.

Yes, that should work.

We can look at a few more fields with lsblk to help track it down:

$ lsblk -o NAME,LABEL,PARTTYPE,PARTLABEL
NAME   LABEL PARTTYPE                             PARTLABEL
sda
├─sda1 boot  c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI
├─sda2 swap  0657fd6d-a4ab-43c4-84e5-0933c84b4f4f linux-swap
└─sda3 root  4f68bce3-e8cd-4db1-96e7-fbcaf984b709 /

The standard GUID for ESP is c12a7328-f81f-11d2-ba4b-00a0c93ec93b (see https://gist.github.com/Alex131089/b3a23c9461e95433387f285f6e0860ca#file-guids-py-L25)

Brett, thanks for this very clear explanation of what happened to me, and for your help then.