NVIDIA and XanMod CL updates

NVIDIA driver installation on Clear Linux

  1. Bump the Display Driver and CUDA Toolkit.
  2. Remove old issues from the README, no longer applicable for 2024 onwards.
  3. Emphasize installing the 545 display driver. Older drivers may not work 100% with recent Clear Linux releases.
  4. Add step to ensure dkms and systemd post-triggler-actions have completed before rebooting.
  5. Improve the CUDA installation instructions. Installing an older CUDA release than the display driver is supported by NVIDIA.

Run XanMod kernels on Clear Linux

  1. Bump the Edge, LTS, and RT kernels.
  2. Use latest Clear LTS patches for the realtime kernel. Note: The RT spec file enables relevant realtime knobs.
  3. Apply the Clear 0112 patch to the Edge variants. Previously, the patch was partially applied.
# Summary              count primes                  cyclictest latency
                       algorithm3(s)  hackbench(s)    max(us)  avg(us)

- /proc/sys/kernel/sched_autogroup_enabled 0

  Clear  6.1.65 LTS        23.564        25.094        2422     1510
  XanMod 6.1.65 Preempt    23.256        24.658         981      470
  XanMod 6.1.64 Realtime   24.439        46.921         160       50

  Clear  6.6.4  Native     22.979        23.124        4602     2260
  XanMod 6.6.4  Preempt    22.862        23.288        3151     1206

- /proc/sys/kernel/sched_autogroup_enabled 1

  Clear  6.1.65 LTS        19.643        25.918        1060      458
  XanMod 6.1.65 Preempt    20.364        26.328         869      349
  XanMod 6.1.64 Realtime   17.204        58.456         168       59

  Clear  6.6.4  Native     20.945        28.634        1588      621
  XanMod 6.6.4  Preempt    21.013        28.850        2276      646
  Nobara 6.6.3  Fsync      21.167        27.434        1523      601
5 Likes

In a nutshell…

  1. For recent CL releases, if eglinfo is crashing using NVIDIA graphics, than try the 545 display driver.
  2. If experiencing high latency running the 6.6 kernel, than try the CL LTS kernel. Another option is to enable sched_autogroup by adding an entry to /etc/clr-power-tweaks.conf (create the file if missing).
/proc/sys/kernel/sched_autogroup_enabled 1

Update:

The eglinfo segfaulting is resolved for 525, 535, and Vulkan drivers by backporting libnvidia-egl-gbm.so from 545.29.06.

3 Likes

Many thanks, @marioroy, for your effort.

I tried CachyOS, and in my opinion, it has the smoothest UI responsiveness thanks to ‘v3 kernels’ of choice. https://cachyos.org. The UI responsiveness while compiling a kernel using all cores was unbelievable.

I wonder whether the BORE scheduler could be integrated into the Clear Linux kernel like in the CachyOS? https://github.com/firelzrd/bore-scheduler

Have you tried the kernel knob sched_autogroup_enabled set to 1? With this enabled, I have not experienced UI slowness building kernels. This knob is enabled by default in CachyOS.

sudo tee -a "/etc/clr-power-tweaks.conf" >/dev/null <<'EOF'
/proc/sys/kernel/sched_autogroup_enabled 1
EOF
1 Like

Many thanks, I will try.

NVIDIA users…

The NVIDIA proprietary driver may not build successfully for Linux kernels 6.8.0-rc2, 6.7.3, 6.6.15, and 6.1.76. Edit: See the next post. I refactored the driver installation script to apply the patch automatically.

ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'

A workaround is patching the NVIDIA kernel source. Download the patch mentioned in the article. Here, I applied the patch to my NVIDIA installation. I’m running 535.154.05.

cd /usr/src/nvidia-535.154.05
sudo patch -p2 < ~/Downloads/nvidia-drivers-470.223.02-gpl-pfn_valid.patch
1 Like

I refactored the installer driver script to allow patching the NVIDIA kernel sources to fix bugs. This must be done before calling DKMS, handled automatically. Here is a test run on my machine. DKMS succeeded for the recent XanMod kernels 6.1.76 and 6.7.3.

$ ./install-driver 535
Installing the NVIDIA proprietary driver...
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 535.154.05...
...............................................................................
...............................................................................
...............................................................................
....................................
The NVIDIA driver installation succeeded.

Backporting libnvidia-egl-gbm.so* from 545.29.06.
Applying nvidia-gpl-pfn_valid.patch to /usr/src/nvidia-535.154.05
patching file common/inc/nv-linux.h
Hunk #1 succeeded at 2069 (offset 79 lines).
patching file nvidia/nv-mmap.c
Hunk #1 succeeded at 584 with fuzz 1 (offset 8 lines).
patching file nvidia/os-mlock.c
Hunk #1 succeeded at 115 (offset 13 lines).
Hunk #2 succeeded at 189 (offset 13 lines).

Registering the NVIDIA kernel module sources with DKMS.
Creating symlink /var/lib/dkms/nvidia/535.154.05/source -> /usr/src/nvidia-535.154.05
Building the NVIDIA kernel modules.
Checking dkms modules in 6.1.69-1330.ltsprev.
Checking dkms modules in 6.1.76-126.xmlts-preempt.
Checking dkms modules in 6.6.14-121.xmrt-preempt.
Checking dkms modules in 6.7.3-128.xmedge-preempt.
Updating the X11 output class configuration file.
Running the fix-nvidia-libGL-trigger service.
...

Above, the backporting libnvidia-egl-gbm.so* from 545.29.06 resolves eglinfo segfaulting under X11.

The XanMod kernels on Clear Linux clearmod repository will be updated separately (what to do if folks have not applied the patch to the NVIDIA sources).

2 Likes

Amazing! Thanks. There is a massive difference in the snappiness of the UI.

1 Like

I added the BORE scheduler patch to XanMod Edge variants in my XanMod on Clear Linux repository. The Burst-Oriented Response Enhancer CPU scheduler works great. For testing, be sure the sched_autogroup_enabled kernel knob is enabled.

I played a 1440p60 HD video in Google Chrome while computing prime numbers in 3 separate terminal windows. Video playback is on the CPU for me (using NVIDIA graphics); making possible to test the BORE CPU scheduler.

Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film
https://www.youtube.com/watch?v=aqz-KE-bpKQ (select 1440p60 HD)
https://github.com/marioroy/mce-sandbox (algorithm3.pl is found here)

XanMod Edge 6.7 with kernel.sched_bore   enabled   disabled   cl-native
$ ./algorithm3.pl 2e12 in terminal #1     97.310     95.921     97.434
$ ./algorithm3.pl 2e12 in terminal #2     98.077     97.184     98.355
$ ./algorithm3.pl 2e12 in terminal #3     97.522     96.609     97.939
                                         -------    -------    -------
                                         292.909    289.714    293.728

Next, I played the same video but increased the quality to 2160p60 4K.

Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film
https://www.youtube.com/watch?v=aqz-KE-bpKQ (select 2160p60 4K)
https://github.com/marioroy/mce-sandbox (algorithm3.pl is found here)

XanMod Edge 6.7 with kernel.sched_bore   enabled   disabled   cl-native
$ ./algorithm3.pl 2e12 in terminal #1     98.952     99.400     98.621
$ ./algorithm3.pl 2e12 in terminal #2     99.778    100.274     99.673
$ ./algorithm3.pl 2e12 in terminal #3     99.031     99.830     99.244
                                         -------    -------    -------
                                         297.761    299.504    297.538

Summary:

  1. The Clear Linux 6.7.3-native kernel drops over 1,000 frames (>25%) in the time taken to compute primes 2e12. Interesting, it requires a preempt-enabled kernel to complement the sched_autogroup_enabled knob.

  2. The XanMod 6.7.4 preempt-enabled kernel with the BORE-scheduler patch took less time to compute primes 2e12, overall. Video playback is smooth and no frame drops with kernel.sched_bore enabled (default) or disabled.

Running 3x the number of logical cores is not typical. This was done to stress test the kernels. The Clear native kernel does okay running one worker per CPU thread. But, not 3x causing video playback to lose many frames.

The BORE CPU scheduler works so well that I bumped the LTS kernel to 6.6.x.

1. Bump to latest XanMod 6.6.x stable and RT kernels.
2. Rebase the LTS variants from 6.1.x to 6.6.x Clear patch set.
3. Apply the BORE CPU scheduler patch to 6.6.x, similar to 6.7.x.
4. Revert the Multi-LLC select_idle_sibling patch in Edge variants.
   It was later requested by kernel developers to drop this patch.
5. Update the readme file.

With the system under load (3x the number of logical cores), opening a new terminal window and not affect the video playback (no stutter) is remarkable.

What about the RT kernel without BORE? Video playback is smooth under 3x load but requires running Chrome with realtime priority. However, launching a new terminal window causes a brief stutter.

Again, running 3x the number of threads is not typical. This was done to stress test the kernels. The BORE CPU scheduler is amazing. You have the option to disable it, mentioned in the README.

1 Like

:pray: :pray: :pray:Thank You! :pray: :pray: :pray:

1 Like

The BORE CPU scheduler is enabled by default.

$ cat /proc/sys/kernel/sched_bore 
1

# Read the current setting using sysctl.
$ sysctl kernel.sched_bore
kernel.sched_bore = 1

# Disable BORE.
$ sudo sysctl -w kernel.sched_bore=0
kernel.sched_bore = 0

# Re-enable BORE.
$ sudo sysctl -w kernel.sched_bore=1
kernel.sched_bore = 1
1 Like

With the Feb-2024 update 2, one can quickly build the XanMod + BORE CPU Scheduler kernel. The default is to build the generic kernel. The LOCALMODCONFIG=1 enables trimming, building only the modules you have running.

./fetch-src edge
LOCALMODCONFIG=1 ./xm-build edge-preempt
./xm-install edge-preempt
1 Like

I captured the time to build the generic and trimmed kernels using 3, 7, 15, and 31 CPU cores. Previously, the generic build took ~ 43 minutes consuming 3 CPU cores. The Feb-2024 update 2 decreased the time. A trimmed build saves beaucoup time and storage utilization.

The /lib/modules/[kernel] size includes the NVIDIA driver on my machine.

Generic kernel (all modules configured in Clear config - default):

$ ./fetch-src main

$ time ./xm-build main-preempt
         3 CPUs        7 CPUs       15 CPUs       31 CPUs
real   41m35.725s    19m 5.350s    10m 5.097s     6m10.755s
user  112m 7.095s   114m12.451s   117m40.371s   127m55.995s
sys     9m38.373s     9m49.154s    10m39.254s    12m 2.763s

$ ls -lh rpmbuild.main/RPMS/x86_64/
total 106M
  ...  71M Feb 15 15:31 linux-xmmain-preempt-6.6.16-133.x86_64.rpm
  ... 109K Feb 15 15:31 linux-xmmain-preempt-cpio-6.6.16-133.x86_64.rpm
  ...  16M Feb 15 15:31 linux-xmmain-preempt-dev-6.6.16-133.x86_64.rpm
  ...  20M Feb 15 15:31 linux-xmmain-preempt-extra-6.6.16-133.x86_64.rpm
  ...  55K Feb 15 15:31 linux-xmmain-preempt-license-6.6.16-133.x86_64.rpm

$ ./xm-install main-preempt
$ du -sh /lib/modules/6.6.16-133.xmmain-preempt/
456M  /lib/modules/6.6.16-133.xmmain-preempt/

Trimmed kernel (only the modules you have running; LOCALMODCONFIG=1):

$ time LOCALMODCONFIG=1 ./xm-build main-preempt
         3 CPUs        7 CPUs       15 CPUs       31 CPUs
real   10m 0.565s     4m54.692s     2m52.431s     1m59.312s
user   26m25.156s    26m49.352s    27m31.223s    29m44.517s
sys     2m 3.640s     2m 6.321s     2m13.948s     2m28.507s

$ ls -lh rpmbuild.main/RPMS/x86_64/
total 51M
  ... 16M Feb 15 15:42 linux-xmmain-preempt-6.6.16-133.x86_64.rpm
  ... 89K Feb 15 15:42 linux-xmmain-preempt-cpio-6.6.16-133.x86_64.rpm
  ... 16M Feb 15 15:42 linux-xmmain-preempt-dev-6.6.16-133.x86_64.rpm
  ... 19M Feb 15 15:42 linux-xmmain-preempt-extra-6.6.16-133.x86_64.rpm
  ... 55K Feb 15 15:42 linux-xmmain-preempt-license-6.6.16-133.x86_64.rpm

$ ./xm-install main-preempt
$ du -sh /lib/modules/6.6.16-133.xmmain-preempt/
190M  /lib/modules/6.6.16-133.xmmain-preempt/

Indeed, what a time saver! :slight_smile: Some aspect of the build process consumes one CPU core, but mostly parallel. Clock speed decreases when consuming more CPU cores.

Wow, the building process is super quick now. Thank you.

1 Like

Today is my last commit to the ClearMod and Nvidia repos. Happy Clear Linux :slight_smile: I decided to forgo updating for a while, preferring a stable environment.

1 Like

I fixed a bug. While doing so, I added support for the latest 535 and 550 driver releases.

1. Add check for 535.161.x (tested 535.161.07).
2. Add check for 550.54.x (tested 550.54.14).
3. Backport NVIDIA egl-gbm library from 550.54.14.
4. Fix if statement applying the DRM hotplug patch.

The new drivers can be found at nvidia.com. Select Tesla and CUDA 12.2 for the 535 update. Pass the path to the installer file as an argument to the install-driver script.

My Clear Linux repositories ClearMod and NVIDIA driver are completed. I had never intended to keep updating these forever. The reason is time constraint.

I wish you all the best, and blessings and grace.

3 Likes

I wish you all the best @marioroy . I am moving away from CL and giving up cutting edge stuff to stability and package availability.

Best wishes to the CL community. Adios!

1 Like

I captured results for HZ_1000, HZ_800, HZ_750, HZ_600, and HZ_500. Going forward, the ClearMod project defaults to HZ_800 for the Edge, Main, and LTS variants. Overriding the default is possible with HZ=value.