Duel GPU/VT-D issues

All,

Running into a strange couple of issues.

First, running a system with the intent of using VT-d. Have 2 GPUS, integrated Intel and a Nvidia card. The intention is to use the integrated card for day to day usage. The nvidia for a kvm hosted win10 box solely for a couple of windows games that currently will not run under wine.

So here is the first issue I have run into - when clear boots, it boots under the integrated graphics, switches to the nVidea for login, then switches back to the integrated for the desktop. Works just fine, just is weird. When I tried to disable the device id:

 echo "8086 10b9" > /sys/bus/pci/drivers/pci-stub/new_id
 echo "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
 echo "0000:01:00.0" > /sys/bus/pci/drivers/pci-stub/bind

the graphic system just locked up on the second line (i used the device id of the nvidea card, not what I just pasted above).

so i figured i could live with the goofy login.

On to the next issue - so when configuring a kvm hosted machine, everything works great. can host machines with no issues until I configure a pass through PCI device for vt-d, pick the nvidea devices and just like before, the graphic subsystem on the host freezes.

in both freeze cases, it is just the graphic system that freezes, I can ssh into the box and reboot it.

So looking for suggestions.

(the bios is configured for both GPUs to be active, full vt-d support and so forth. was running windows 10 and nvidia before switching the system to clear)

Thanks in advance.

NVIDIA says it supports rendering graphics on one GPU and dispaly image by another GPU, by modifying certain options in your Xorg.conf.

Also, you may try to use nvidia-xrun, which basically does the same thing.

Thanks for replying, not sure that is helpful or at least not clear to me.

I have no desire to run the nvidia as my primary gdm manager, so as I understand using nvidia-xrun will not do much. My desire is render the nvidia card inert during normal Linux operations and only have it available as a pass-through device for the hosted KVM windows instance.

Have you had this working well on another Linux distro or is CL your first try doing this setup on Linux?
Can you post all the steps you’ve followed so far?

I experienced this too before installing the proprietary NVIDIA driver.

You can try blacklisting the nouveau driver to stop it from loading since you’re planning to use it for passthrough anyways. That would workaround the login issue and make sure the vfio-pci module isn’t clashing with the graphics driver.

You might find hints in dmesg and /var/log/Xorg* logs.

Ah will try blacklisting, didn’t catch that idea. Will let you know how that goes.

Great suggestion! Much further ahead. No lockups anymore. blacklisting stopped the freezing. I am now able to run the VMM machine and pass in the PCI ID for the video. Side note, it did require passing in the sound driver as well.

So Got to the point where I can boot up the VM, installed the nVidia Drivers and they are happy. No joy on the direct video out, but at least progress.

2 Likes

So upon further research I am back to being blocked on this. It is probably just me. Most of the information that I can find out in the wild on doing this relies on either the host OS being Ubuntu or a Red Hat derivative. Near as I can tell, they are assuming that there is some changes to how the bootloader is being initialized, along with the disabling of the modules (which effectively disabling the nouveau drivers did). It seems that Clear with some of it’s optimizations does some of the bootloader items which is what allows the kvm systems to even recognize the devices. I have hacked through most of the articles I have found on kvm passthrough with moderate success so far.

The windows VM will recognize the PCI device (in this case, nVidia device), install the drivers but then disable them indicating there is a problem with the device. I know the device is just fine because I was running in a fully Windows based system before I made the switch over to using Clear on this machine.

One of the more valuable pages of information:

This article goes over creating updated initramfs modules for boot, which I do not feel comfortable replacing on the Clear boot, is that really necessary to make this work?

Any one out there who has gotten this to work?

Thanks in advance

1 Like

install the drivers but then disable them indicating there is a problem with the device

Doesn’t NVidia blocking using their GPUs in VM?

Not that I know of, I see many people doing this. Again with other distros; For gaming reasons.

My need is for some video encoding that uses the CUDA drivers with Windows, otherwise I could go pure Linux (which is the long term goal here). If the setup works for gaming, it will work for the CUDA drivers.

Almost all of the docs/videos out there are precisely for how to do this with Ubuntu/Fedora and nVidia. I really have come to like the security and speed of Clear so thus the stubbornness of trying to get it working.

Thanks

So nVidia does make it a bit of a pain, yes. But people are able to do it. I think where I am at is understanding how to completely blacklist the devices properly ala how the other distro’s do it.

1 Like

Could you share some more information about the VM, like qemu parameters ? I am having trouble with vfio.

Here is what I did:

$ cat /etc/kernel/cmdline.d/guest.conf 
pci-stub.ids=10de:1f08,10de:10f9,10de:1ada,10de:1adb
$ cat guest.sh
#!/bin/bash

echo 0000:01:00.0 > /sys/bus/pci/drivers/pci-stub/unbind
echo 0000:01:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:01:00.1 > /sys/bus/pci/drivers/pci-stub/unbind
echo 0000:01:00.1 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:01:00.2 > /sys/bus/pci/drivers/pci-stub/unbind
echo 0000:01:00.2 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:01:00.3 > /sys/bus/pci/drivers/pci-stub/unbind
echo 0000:01:00.3 > /sys/bus/pci/drivers/vfio-pci/bind

modprobe tpm
modprobe tpm_crb

qemu-system-x86_64 \
-machine type=q35,accel=kvm \
-cpu host,kvm=off \
-smp sockets=1,cores=4 \
-m 8G \
-vga none \
-nographic \
-serial none \
-parallel none \
\
-drive if=pflash,format=raw,readonly,file=/usr/share/qemu/OVMF.fd \
\
-device vfio-pci,host=01:00.0,multifunction=on \
-device vfio-pci,host=01:00.1 \
-device vfio-pci,host=01:00.2 \
-device vfio-pci,host=01:00.3 \
\
-drive if=virtio,format=raw,file=/dev/nvme0n1 \
-drive format=raw,file=fat:rw:guestkey \
-tpmdev passthrough,id=tpm0,path=/dev/tpm0 \
-device tpm-crb,tpmdev=tpm0

Everything works without the -device vfio-pci things. However it get locked up if these lines are there.
And the guest is not up.

There was a bug in vfio since kernel version 5.1. I am currently on 5.3.7-853, however you report that your configuration works. So would you please share some information with me ?

BTW, it is rtx 2060 on 9600k.