Research XanMod kernels on Clear Linux

When on Fedora Linux, I run the XanMod kernel. So naturally, it has been on my to-do list to try XanMod on Clear Linux. One may not notice the difference between the two unless the system is under load. I tuned the XanMod kernel to behave more like a RT kernel with minimum impact to performance. This is what I prefer running a Linux desktop environment.

What follows are low-latency RT-Tests comparing the two kernels. How this works is that terminals 1, 2, and 3 run simultaneously. And terminal 4 is added to the mix for the 2nd run. I hit the up arrow and press return repeatedly until terminals 1 through 3 have completed.

The machine is a Threadripper 3970X box - 32 physical cores, 64 logical cores. Edit: I just learned that the dev-utils bundle provides hackbench and cyclictest.

Hackbench1, Hackbench2, and Cyclictest (three terminal windows). The fourth window adds IO to the mix (total 4 actions running simultaneously).

$ numactl -C 0-31 ./hackbench -T -f 4 -g 8 -l 3000000
$ numactl -C 32-63 ./hackbench -T -f 4 -g 8 -l 3000000
# ./cyclictest --smp -p98 -m -D 40 -q  # run as root
# run repeatedly as root (up arrow, press return)
sync; echo 3 >/proc/sys/vm/drop_caches; time ls -R /usr

Clear LTS 6.1.53

hackbench1  31.802s
hackbench2  33.828s

cyclictest max latencies (no IO)
   25  31  18  17  31  38  22  17  20  15  19  23  22  23  15  12
   35  78  51  15  13  17  19  16  21  26  12  16  54  19  47  54
   18  22  19  19  14  63  10  11  10  14  15  16  13  14  14  15
    9  21  23  19  14  11  10  25  18  22  11  25  80  20 101  30

cyclictest max latencies (with IO, sync-echo-ls -R /usr)
  119 104  61 905  35  30  39  33 1442 80  24  29  21  33  52  17
   60  15  40  33  90  69  82  81  23  37  32  60  14  10  40  46
   30  23  37  28  12  59  21  30  21  17  15  66  47  30  14  17
   28  16  10  16  84  60  49  57  48  36 218  28  18 107  18  17

sync-ls time: 12.182s under load

XanMod LTS 6.1.53

hackbench1  33.589s
hackbench2  35.385s

cyclictest max latencies (no IO)
   25  19  30  24  20  24  19  23  14  16  21  14  18  20  15  24
   21  25  11  21  14  15  20  13   8  15  14  13  51  14  16  19
   18  17  19  15  18  13  26  15  14  16  13  20  71  10  12   8
   49  17  11  17  14  11  11   9  13  25   9  25   7  98  26  15

cyclictest max latencies (with IO, sync-echo-ls -R /usr)
   53  85  62  45  18  46  24 104  20  80  42  39  16  18  63  18
   29  21  16  18  48  16  17  34  29  16  21  71  18  65  73  22
   20  35  43  72  14  35  19  24  18  49  15  19  20  24  35  12
   35  31  56  14  18  17  45  40   9  29   9  15   8 100  27  21

sync-ls time: 11.406s under load

Two things to notice here. The XanMod kernel has full-preempt enable. But, with minimal impact to performance. IO is faster possibly due to the kernel built for CPU v3 (x86-64-v3). The other thing to compare are the latency spikes during IO. The XanMod kernel persist in low latency.

To validate the above, I ran the os-scheduler-responsiveness test. I ran three times in a row. Here too, the XanMod kernel maintains lower latency. Regardless, CL does well.

Clear LTS 6.1.53

$ time python3 responsiveness.py -i1 -p60 --np 4 -t 30

  total response time:  0.07434658700026375s
   runs:  13  average:  0.005718968230789519s

  total response time:  0.07404825000048731s
   runs:  13  average:  0.005696019230806717s

  total response time:  0.1057417729991812s
   runs:  13  average:  0.008133982538398553s

  real    0m26.528s
  user   24m30.043s
  sys     0m 0.224s

XanMod LTS 6.1.53

$ time python3 responsiveness.py -i1 -p60 --np 4 -t 30

  total response time:  0.0540768100004243s
   runs:  13  average:  0.004159754615417254s

  total response time:  0.05782095499999684s
   runs:  13  average:  0.004447765769230527s

  total response time:  0.05449856700002442s
   runs:  14  average:  0.00389275478571603s

  real    0m26.525s
  user   24m30.127s
  sys     0m 0.088s
1 Like

The XanMod kernel performs similarly to the Clear kernel.

$ time rpmbuild -bb linux-xanmod.spec
CL-LTS 6.1.53:  6m30.297s
XanMod 6.1.53:  6m31.350s

Count prime numbers to 1e13.

perl algorithm3.pl 1e13
CL-LTS 6.1.53:  182.927 seconds
XanMod 6.1.53:  183.076 seconds

On Fedora Linux, I can use chrt to set a round-robin scheduling policy for Chromium, as a normal user, and not experience a frame drop watching a video with background jobs. Unfortunately, chrt doesn’t work on Clear Linux for normal users.

The XanMod kernel is a lot of fun. Stay tuned for a follow-up guide.

1 Like

Note that for us the LTS kernel is “slow but steady”… the normal kernel is where we do performance work/etc…

2 Likes

I am looking forward to the follow-up guide. :pray:

Motivation: I’ve been running the XanMod kernel on Fedora Linux and chrt Chromium browser for some time. It’s quite nice. Then, later booted into CL and immediately miss the XanMod kernel and chrt. And so the journey ignited…

The XanMod kernel on Clear Linux OS is a success. I have four spec files.

  1. linux-xmlts, based on CL LTS
  2. linux-xmmain, based on CL Native
  3. linux-xmedge, reserved for bleeding edge
  4. linux-xmrt, based on CL Preempt-RT (testing today)

Consider phase one mostly completed, from an idea (run XanMod on Clear Linux) to completing spec files, validation (yesterday’s latency results), and stress testing. Not yet tested is the RT variant which I will try today. Phase two is writing the guide. Phase three is the GitHub repository and automation.

1 Like

This is going to be interesting :wink:

I completed phase one to include the spec file for the XanMod RT variant.

Clear RT 6.1.38

hackbench1  36.323s
hackbench2  37.381s

cyclictest max latencies (no IO)
   94  95  96  89  87  95  94  87  56  81  73  30  97  21  35  88
   14  17  90  39  34  33  19  29  80  25  14  63  25  22  17  31
   14  65  41  14  11  11  81  13  12  16  12  15  14  11  21  15
   15  32  16  86  15  12  19  38 116  11  24 108  29  36  14  11

cyclictest max latencies (with IO, sync-echo-ls -R /usr)
  155 1014 117 188  93  99  92  91  74  73  93  35  87 117  83   80
   74   92  72 121  96 118  92  97  90  21  41  19  25  21  37 5791
   43   24  13  91  17  83  24  49  14  12  15  23  23  13  14   11
   18   81  18  28  12  13  17  50 333  13  52  12  13  12  19   51

I ran again to see if the behavior repeated, high spikes.
  393 2544 175 361  76 191  81  30 107  43  80 131  83  90 125  119
   82   97  74 172  70 103  57  71 104  77  88  41  19 118  42   60
   19   99  17  59  24  17  13  50  36  18  37 109  35  12  32   16
   14   13  33  23  88  14  15  15 260  32  16  15 107  15  65   23

sync-ls time: 12.265s under load

XanMod RT 6.1.46

hackbench1  72.931s
hackbench2  73.489s

cyclictest max latencies (no IO)
   17  14  19  29  17  23  22  36  22  15  20  13  25  18  17  14
   17  12  14  12  16  12  15  11  16  19  13  12  13  19  14  26
   12  21  22  11  11  43  45  10  12  11  17  13   9  25  16  11
   25   9  10  12  11  12  13  12  10   9  11  13  11  17  20  10

cyclictest max latencies (with IO, sync-echo-ls -R /usr)
   19  19  16  16  14  24  14  14  13  18  15  15  14  48  26  30
   13  12  14  19  20  12  13  13  13  14  15  23  11  12  14  17
   14  15  16  48  12  16  11  13  60  11  11  12  10  17  14  14
   12  12  12  12  12  15  12  11  10  12  16  14  10  24  12  16

sync-ls time: 12.742s under load

The XanMod hackbench results take twice as long. On the plus-side, the low-latency results are consistent between runs even with IO involved.

For the Clear RT kernel, the high spikes occur occasionally and unsure if due to being an AMD box. Unlike the XanMod RT kernel, the results are inconsistent between runs. Here again, another run for the Clear RT kernel.

cyclictest max latencies (with IO, sync-echo-ls -R /usr)
  105  103 101  94 181  83  96 118  85 348  73  92  55 101  92   41
   62   18  35  31  23  33  23  72  79 115  16  78  97 122  57   17
   13   17  19  16  59  16  15  19  15  17  15  14  21  19  55   18
   17   78  58  15  13  15  14  13  33  14  65  51  29  19 244   37

Next, I captured os-scheduler-responsiveness test results. I ran three times in a row. Here, the Clear RT kernel reaches the XanMod LTS variant. What’s not to like about the XanMod RT kernel? It runs consistently, no high spikes throughout testing.

Clear RT 6.1.38

$ time python3 responsiveness.py -i1 -p60 --np 4 -t 30

  total response time:  0.06028524799967272s
   runs:  13  average:  0.004637326769205594s

  total response time:  0.056690662999926644s
   runs:  13  average:  0.004360820230763588s

  total response time:  0.0578470569998899s
   runs:  13  average:  0.0044497736153761465s

  real    0m26.515s
  user   24m40.944s
  sys     0m 0.170s

XanMod RT 6.1.46

$ time python3 responsiveness.py -i1 -p60 --np 4 -t 30

  total response time:  0.04861944200001744s
   runs:  14  average:  0.0034728172857155315s

  total response time:  0.04565400100000261s
   runs:  14  average:  0.003261000071428758s

  total response time:  0.04985732100021778s
   runs:  14  average:  0.00356123721430127s

  real    0m27.204s
  user   25m18.383s
  sys     0m 0.122s

The Clear LTS or XanMod LTS kernel for the desktop or cloud.

The XanMod RT kernel with consistent low latency, would be a great choice for a web server, music production, trading, video conferencing, VOIP, game streaming.

XanMod on Github +14.500 contributors, including Linus :wink:

I am keeping a close eye on this thread @marioroy with excitement. I look forward to you publishing the GitHub project so I can play with these kernels in CL.

Will the Nvidia driver play along nicely with the RT kernel? I really like Debian 12, flawless Nvidia graphics driver, CUDA, and how stable it is for generative AI stuff. It has GCC 12 so CUDA plays along nicely. One need to compile GCC12 for Fedora and my compilation always fail making me realise how much a newbie I am to Linux.

Only the xanmod LTS worked for me with working Nvidia graphics in Debian 12.

2 Likes

Yes.

By default, NVIDIA aborts installation if the kernel is configured with PREEMPT_RT. NVIDIA provides a way with IGNORE_PREEMPT_RT_PRESENCE=1. This can be done system-wide (remember to reboot afterwards) or pass the environment key-value pair to sudo.

sudo IGNORE_PREEMPT_RT_PRESENCE=1 echo "Unus pro omnibus, omnes pro uno"
sudo IGNORE_PREEMPT_RT_PRESENCE=1 echo "One for all, all for one"
$ ./xm-install rt
Installing XanMod rt.
Password: 
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]
Updating / installing...
   1:linux-xmrt-license-6.1.46-3      ################################# [ 25%]
   2:linux-xmrt-6.1.46-3              ################################# [ 50%]
   3:linux-xmrt-extra-6.1.46-3        ################################# [ 75%]
   4:linux-xmrt-dev-6.1.46-3          ################################# [100%]
Building kernel drivers for NVIDIA graphics.
1 Like

This is amazing!

I am just thinking what new name we can give to a new nvidia centric CL distro like Fedora-Nobara! Clearvidia?? LOL. (I am only joking and hoping not to offend anyone).

Thank you, Aaron Lu and team. Running hackbench, the 6.4 and 6.5 kernels (using updated scale.patch) now perform similarly to the 6.1 LTS kernel.

XanVidia :slight_smile:

Running on Clear, of course.

LOL!

@Businux I saw that nouveau repo maintainer had resigned. Also, intel has an AI chip running LLAMA!! Fair enough, why does CL wants to accommodate the rival’s proprietary driver :upside_down_face:

While I was reading about gaming linux distros, I saw that x11 being recommended rather than Wayland. Apparently, there is about a 15% performance hit with Wayland.

@marioroy, is it worth running the benchmarks again with x11?

1 Like

There it is, running like a champ :llama:

1 Like

Benchmark results apples-to-apples comparison on Clear Linux.

The RT variant will have two preempt flavors; one configured with PREEMPT_RT, the other PREEMPT_DYNAMIC.