The recent upgradation to build #43500 caused serious issues for its users. This issues were once unheard of CL.
What do we do to avoid such issues and feel helpless?
Do we stop updating our system or update after reading a notice from the devs that it’s safe to do so?
Does CL devs issues those notices at all?
I agree.
Rolling release is a method that delivers the latest version the fastest, but in practice, it feels more like they are continuously releasing ad-hoc canary versions.
Since Clear Linux is primarily focused on container and cloud usage, the rolling release might be premised on replacing old servers with new ones rather than updating already deployed servers. However, if operating it as a desktop, the former premise applies, making it difficult to ignore.
While I don’t oppose rolling release as it’s one of the characteristics of this distribution, I believe some kind of fail-safe measure should be provided.
For example, making updates to what is effectively considered the stable latest version the default, and requiring an option to be specified for updating to the true latest version.
swupd update # Effectively stable version
swupd update -current # Latest version
Or vice versa:
swupd update -stable # Effectively stable version
swupd update # Latest version
I think it would also work to just announce the version number that is considered effectively stable, without even needing to add command options.
This.
Instead of simply assuming everything works after an upgrade, consider swupd running a few small tests to verify that everything functions as expected, right after upgrading and another time after first reboot.
I acknowledge your frustration. As background, I tried to update the clr-init initrd to resolve an issue with booting LVM2 volumes. Unfortunately, I discovered after it had already landed in another build that it was missing a critical additional file from the new version of systemd. I raced to push an update that added the file, but meanwhile 43500 had already gone out. We revoked that release, which meant that nobody else would be upgraded to it. During that time, I also worked on a recovery process to help everybody who was affected.
We have an automated test suite, including VMs and physical machines, that test every release. Unfortunately, initrds are a minefield, and instead of reporting a failure, this just took out our hardware test fleet, and under the pressure of other high priorities, we didn’t connect that with this change. After deploying a fixed release (43520), we found out it still wouldn’t support LUKS filesystems (for which we don’t have an automated test), so we reverted to the old initrd and quickly pushed another release (43540) to restore the prior function. As it turns out, the LVM2 issue was a different problem.
There are several factors here. First, please understand, this is an extremely small team. All of us post on this forum. There is a lot of work shared among very few people. We’ve had to scale back our support of desktop use in the last few years in order to maintain our focus on base performance, for example. Second, initrd is a fragile point within the Linux ecosystem. There is no good implementation of redundancy or recovery for a bad initrd – you just find out at next boot and have to rescue your system. We’d rather not use it at all, and if we reach a point that certain use cases no longer need it, we’d happily eliminate it. Third, it’s been a few years since a Clear Linux release has suddenly prevented boot, but it does happen occasionally. While this case affected you prominently, it’s really a rare occurrence. I’m personally sorry I broke boot for you and all others affected. But do please keep it in perspective.
This is the default. The published release is the stable release. Only if you use -F staging
would you get the “true latest version”. But understand that we do only basic tests of the packaged software itself (does the binary run without crashing, for example) – we don’t have the resources to guarantee that a particular software package will be bug-free, let alone test more extensive use cases, and because we try to follow upstream releases closely, you’ll both get the bug and its subsequent fix rapidly. That’s just innate with rolling releases. For mission-critical systems, ideally you would deploy an update on a test system, verify your specific use case, and only then deploy it to production systems. That said, many of us (including me) are running on a single system and have to fix it the hard way when something breaks.
I was on 43470 for awhile, 43500 had the audio fix due to wireplumber but then saw the warning not to update so I waited, 43520 came out so I upgraded to it.
but then I got stuck as well LMAO. 43540 fixed everything in the end so all’s well that ends well