Kernel 5.18.2-1151 and mdraid volumes

Hi there,

TL;DR

After upgrading to 5.18.2-1151 my system boots but gets stuck somewhere. I do not get a tty to log in and no network either but I can Ctrl-Alt-Del to reboot cleanly.

Looking at the journalctl logs, it seems that my configuration based on mdraid (for /home and swap) may be hit by the same kernel issue that Arch has fixed here

I confirm that this flag is not set in the clr native kernels:

# grep CONFIG_BLOCK /usr/lib/kernel/config*
/usr/lib/kernel/config-5.17.11-1148.native:CONFIG_BLOCK=y
/usr/lib/kernel/config-5.17.11-1148.native:CONFIG_BLOCK_COMPAT=y
/usr/lib/kernel/config-5.17.11-1148.native:CONFIG_BLOCK_HOLDER_DEPRECATED=y
/usr/lib/kernel/config-5.18.2-1151.native:CONFIG_BLOCK=y
/usr/lib/kernel/config-5.18.2-1151.native:# CONFIG_BLOCK_LEGACY_AUTOLOAD is not set
/usr/lib/kernel/config-5.18.2-1151.native:CONFIG_BLOCK_COMPAT=y
/usr/lib/kernel/config-5.18.2-1151.native:CONFIG_BLOCK_HOLDER_DEPRECATED=y

More details

I have two NVMe drives: one has boot and root, and then home is mdraid RAID1 and swap is mdraid RAID0 over the two drives:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10]
md126 : active raid1 nvme1n1p1[1] nvme0n1p3[0]
      905819136 blocks super 1.2 [2/2] [UU]
      bitmap: 0/7 pages [0KB], 65536KB chunk

md127 : active raid0 nvme0n1p4[0] nvme1n1p2[1]
      10227712 blocks super 1.2 256k chunks

unused devices: <none>

and

# blkid
/dev/nvme0n1p3: UUID="0d54001e-585a-dc45-0888-77afcc788121" UUID_SUB="f23aed60-d216-9ac6-5e56-b7c6d12995cf" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="b4c48e2c-5575-4826-85c7-f18a8e904e77"
/dev/nvme0n1p1: LABEL_FATBOOT="BOOT" LABEL="BOOT" UUID="5F40-EB7A" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="CLR_BOOT" PARTUUID="8e283b86-a608-45db-b263-99f30c6be2ca"
/dev/nvme0n1p4: UUID="11f14255-40aa-505c-925b-255c8c2b428e" UUID_SUB="34744ca4-113c-b4d8-1f19-6105f39dd31a" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="7b36c967-8c4b-4d0e-863b-0554ababca7f"
/dev/nvme0n1p2: LABEL="root" UUID="979014be-15cd-46f9-a4da-40a7d5bcbdc7" BLOCK_SIZE="4096" TYPE="f2fs" PARTLABEL="CLR_ROOT" PARTUUID="e86f4d85-db2c-4219-a772-625e5b231115"
/dev/md127p1: LABEL="swap" UUID="1acbab90-a4cb-4efd-85a0-54133e92b7ec" TYPE="swap" PARTLABEL="CLR_SWAP" PARTUUID="25212bd4-1737-45fe-9726-97b556551e33"
/dev/md126p1: LABEL="home" UUID="c7596ae8-9370-405d-9ca0-f2ed7192e530" BLOCK_SIZE="4096" TYPE="f2fs" PARTLABEL="CLR_MNT_/home" PARTUUID="20722973-6e1d-41de-b0e0-3634991b2cf8"
/dev/nvme1n1p2: UUID="11f14255-40aa-505c-925b-255c8c2b428e" UUID_SUB="e35918e4-7888-28ca-536d-b08f938efdba" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="81ff8cd1-1702-4602-a404-0e7320a2c104"
/dev/nvme1n1p1: UUID="0d54001e-585a-dc45-0888-77afcc788121" UUID_SUB="bf904adb-048b-5edd-2c56-b9c481fa4e94" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="39abb04e-4f35-46fc-9f1c-01ae2c8bc688"

Even if I tried to use partition labels like CLR_MNT_/home throughout, that never worked out of the box and I had to add since the beginning a fstab for things to work for the mdraid partitions:

# cat /etc/fstab
#[Device]      [Mount Point] [File System Type] [Options] [Dump] [Pass]
/dev/md126p1   /home         f2fs              defaults   0       2
/dev/md127p1   swap	     swap	       defaults	  0	  0

When I comment all the entries out in the fstab the 5.18.2-1151 system will boot and bring up the network.

Suspecting some md numbering shenanigans I moved to using UUIDs and that works fine with 5.17.11-1148:

# cat /etc/fstab
#[Device]      [Mount Point] [File System Type] [Options] [Dump] [Pass]
UUID=c7596ae8-9370-405d-9ca0-f2ed7192e530   /home         f2fs              defaults   0       2
UUID=1acbab90-a4cb-4efd-85a0-54133e92b7ec   swap	     swap	       defaults	  0	  0

Unfortunately this still does not make 5.18.2-1151 happy.

What does it look like when 5.18.2 boots?

Under 5.18.2-1151, dmesg says that

[    0.623752] md: Waiting for all devices to be available before autodetect
[    0.623753] md: If you don't use raid, use raid=noautodetect
[    0.623754] md: Autodetecting RAID arrays.
[    0.623997] md: invalid raid superblock magic on nvme1n1p1
[    0.623998] md: nvme1n1p1 does not have a valid v0.90 superblock, not importing!
[    0.624015] md: invalid raid superblock magic on nvme1n1p2
[    0.624016] md: nvme1n1p2 does not have a valid v0.90 superblock, not importing!
[    0.624254] md: invalid raid superblock magic on nvme0n1p3
[    0.624255] md: nvme0n1p3 does not have a valid v0.90 superblock, not importing!
[    0.624271] md: invalid raid superblock magic on nvme0n1p4
[    0.624271] md: nvme0n1p4 does not have a valid v0.90 superblock, not importing!
[    0.624272] md: autorun ...
[    0.624273] md: ... autorun DONE.

and that is correct, since the metadata version is 1.2 since quite some time:

# mdadm --examine --scan
ARRAY /dev/md/home  metadata=1.2 UUID=0d54001e:585adc45:088877af:cc788121 name=hgcsynth1:home
ARRAY /dev/md/home  metadata=1.2 UUID=11f14255:40aa505c:925b255c:8c2b428e name=hgcsynth1:home

At this point I should note that I do not have a mdadm.conf and I am not keen on rebuilding the mdraid arrays just for the kernel to tell me again that I do “not have a valid v0.90 superblock”.

Any ideas and suggestions welcome!

1 Like

Same issue for me, for sure related to the new 5.18 kernel and md raid. To restore had to boot from USB and install kernel-lts bundle.
All works well with lts.

That sounds pretty radical. I managed to just “roll back” the kernel with clr-boot-manager set-kernel org.clearlinux.native.5.17.11-1148.

Could not boot to OS, was stuck after loading kernel message. I also do not want next update of native to have the same issue. LTS kernels should not have surprises like this.

1 Like

I agree, kernel and module updates were a lottery with Linux kernel 2, but these days it’s not acceptable even for clients.

Adding a /etc/mdadm.conf did it for me:

$ cat /etc/mdadm.conf
DEVICE partitions
ARRAY /dev/md126 level=raid1 num-devices=2 metadata=1.2 name=hgcsynth1:home UUID=0d54001e:585adc45:088877af:cc788121
   devices=/dev/nvme0n1p3,/dev/nvme1n1p1
ARRAY /dev/md127 level=raid0 num-devices=2 metadata=1.2 name=hgcsynth1:swap UUID=11f14255:40aa505c:925b255c:8c2b428e
   devices=/dev/nvme0n1p4,/dev/nvme1n1p2
$ cat /etc/fstab
#[Device]      [Mount Point] [File System Type] [Options] [Dump] [Pass]
UUID=c7596ae8-9370-405d-9ca0-f2ed7192e530   /home         f2fs              defaults   0       2
UUID=1acbab90-a4cb-4efd-85a0-54133e92b7ec   swap	     swap	       defaults	  0	  0

It’s a bit too much hardcoding for my taste, given all the metadata embedded in the partitions, but hey, now it’s working.

FWIW, this is the metadata I mean:

$ lsblk -o NAME,FSTYPE,FSVER,LABEL,PARTLABEL,UUID,FSAVAIL,FSUSE%,MOUNTPOINTS
NAME          FSTYPE            FSVER LABEL          PARTLABEL     UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
nvme1n1
├─nvme1n1p1   linux_raid_member 1.2   hgcsynth1:home               0d54001e-585a-dc45-0888-77afcc788121
│ └─md126
│   └─md126p1 f2fs              1.13  home           CLR_MNT_/home c7596ae8-9370-405d-9ca0-f2ed7192e530  338.1G    61% /home
└─nvme1n1p2   linux_raid_member 1.2   hgcsynth1:swap               11f14255-40aa-505c-925b-255c8c2b428e
  └─md127
    └─md127p1 swap              1     swap           CLR_SWAP      1acbab90-a4cb-4efd-85a0-54133e92b7ec                [SWAP]
nvme0n1
├─nvme0n1p1   vfat              FAT32 BOOT           CLR_BOOT      5F40-EB7A
├─nvme0n1p2   f2fs              1.13  root           CLR_ROOT      979014be-15cd-46f9-a4da-40a7d5bcbdc7   41.5G    34% /
├─nvme0n1p3   linux_raid_member 1.2   hgcsynth1:home               0d54001e-585a-dc45-0888-77afcc788121
│ └─md126
│   └─md126p1 f2fs              1.13  home           CLR_MNT_/home c7596ae8-9370-405d-9ca0-f2ed7192e530  338.1G    61% /home
└─nvme0n1p4   linux_raid_member 1.2   hgcsynth1:swap               11f14255-40aa-505c-925b-255c8c2b428e
  └─md127
    └─md127p1 swap              1     swap           CLR_SWAP      1acbab90-a4cb-4efd-85a0-54133e92b7ec                [SWAP]
$ sudo blkid
/dev/nvme0n1p3: UUID="0d54001e-585a-dc45-0888-77afcc788121" UUID_SUB="f23aed60-d216-9ac6-5e56-b7c6d12995cf" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="b4c48e2c-5575-4826-85c7-f18a8e904e77"
/dev/nvme0n1p1: LABEL_FATBOOT="BOOT" LABEL="BOOT" UUID="5F40-EB7A" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="CLR_BOOT" PARTUUID="8e283b86-a608-45db-b263-99f30c6be2ca"
/dev/nvme0n1p4: UUID="11f14255-40aa-505c-925b-255c8c2b428e" UUID_SUB="34744ca4-113c-b4d8-1f19-6105f39dd31a" LABEL="hgcsynth1:swap" TYPE="linux_raid_member" PARTUUID="7b36c967-8c4b-4d0e-863b-0554ababca7f"
/dev/nvme0n1p2: LABEL="root" UUID="979014be-15cd-46f9-a4da-40a7d5bcbdc7" BLOCK_SIZE="4096" TYPE="f2fs" PARTLABEL="CLR_ROOT" PARTUUID="e86f4d85-db2c-4219-a772-625e5b231115"
/dev/md127p1: LABEL="swap" UUID="1acbab90-a4cb-4efd-85a0-54133e92b7ec" TYPE="swap" PARTLABEL="CLR_SWAP" PARTUUID="25212bd4-1737-45fe-9726-97b556551e33"
/dev/md126p1: LABEL="home" UUID="c7596ae8-9370-405d-9ca0-f2ed7192e530" BLOCK_SIZE="4096" TYPE="f2fs" PARTLABEL="CLR_MNT_/home" PARTUUID="20722973-6e1d-41de-b0e0-3634991b2cf8"
/dev/nvme1n1p2: UUID="11f14255-40aa-505c-925b-255c8c2b428e" UUID_SUB="e35918e4-7888-28ca-536d-b08f938efdba" LABEL="hgcsynth1:swap" TYPE="linux_raid_member" PARTUUID="81ff8cd1-1702-4602-a404-0e7320a2c104"
/dev/nvme1n1p1: UUID="0d54001e-585a-dc45-0888-77afcc788121" UUID_SUB="bf904adb-048b-5edd-2c56-b9c481fa4e94" LABEL="hgcsynth1:home" TYPE="linux_raid_member" PARTUUID="39abb04e-4f35-46fc-9f1c-01ae2c8bc688"