Problems with KVM

Hi all,

About a month ago, I rebooted my Clear Linux (hardware server) and noticed that my only KVM guest had not come up completely. That was a fairly busy time, and I haven’t have a chance to revisit this since, until the past few days.

I hate to post such an unspecific request, but this issue seems rather opaque to me. Here’s what I know:

The guest is running Ubuntu 20.04 LTS. It’s been running for 6-12 months without any issues. The guest boots, then freezes after a console message (apparently loading) Linux agpgart interface v0.103. What I’ve done since getting back to this: I see references to this code along with issues booting, but all of the search results seem to be pretty old - the most recent I found indicated that a reboot of the host system (which I’ve done several times) fixed the issue.

I have also tried booting a couple of ISO installers, one for RHEL and another for Knoppix. Neither of these get past an initial message or two, nor provided any feedback or messages that might hint about the issue. I did notice complaints from virt-manager about a problem starting dconf.service but I think that’s just a dependency for virt-manager not affecting the guest.

I’m running version 36580. The only other unusual piece of the environment is that I have openvswitch installed to manage the guest’s networking but I don’t see any indication that’s involved (and KNOPPIX doesn’t automatically try to access a network interface as far as I can recall - one of the reasons I tried it).

I know this is extremely vague, but even ideas on how to get at least some level of useful data on what is happening would help. Nothing that looks like an error or anything unexpected is showing up in the journal (that’s where I did see the reference to dconf) and I’m not sure where to look next. I don’t know how many people are running KVM, so even whether this is just something that I am seeing, but any thoughts are welcome. Thanks.

To close the loop here…

I believe that there may be an issue with Kernel 5.18 and support in KVM for a particular CPU feature. Should anyone else stumble over this: I had chosen host-model for the guest CPU configuration. After many hours of searching, a discussion about how a particular feature called Intel CET had been implemented led me to wonder whether changing the guest CPU might at least give me some debugging information. I changed this option (I’m using virt-manager) to hypervisor-default which turns out to be “qemu-64”.

Once I did this, the guest booted and ran. I strongly suspect that qemu-64 is not as efficient as relying on the native CPU instructions, but it got me up and running again after 47 days. Most of my stuff is containerized; there was only one management system running in KVM, but this hasn’t been a pleasant time.

3 Likes