What drivers should I use for CUDA in ClearLinux

Hi all,

I just bought a new server with a NVIDIA RTX 4060 so I can start to develop something for HPC using CUDA.

My main question is if I need to install the proprietary NVIDIA drivers or the nouveau drivers has support for CUDA on CL.

Also, is this tutorial for installation of the NVIDIA drivers updated?

https://www.clearlinux.org/clear-linux-documentation/zh_CN/tutorials/nvidia.html

And Finally, I found this topic

https://community.clearlinux.org/t/install-nvidia-drivers-on-clear-linux-os-server/7431

stating that there are issues with NVIDIA drivers and swupd. Are these issues solved?

Best,

Have you looked at @marioroyā€™s :

Itā€™s archived and not working with the 6.12 kernel, but might be of interest to get an idea of setup requirements etc.

I think you need to add CUDA as an extra, at least thatā€™s how it works in other distros.

Go ahead and try with Clear Linux, but Iā€™d suggest setting aside a small partition (32GB minimum plus enough for your own data) and installing a distro that CUDA supports natively (Ubuntu or Mint are what I use) rather than dealing with everything from source or ā€œmanual installsā€. Which is not to say it canā€™t be done, but I believe in choosing the right tool for the job, rather than being stubbornly dedicated to a particular build ā€œjust becauseā€.

Iā€™m not saying it canā€™t be done, but itā€™s much easier to ā€œapt install nvidia-cuda-toolkitā€ and be done with it rather than jumping through hoops. Itā€™s the same reason people stopped using Slackware or other source based distros versus using popular, tested distros. Technically they can all do the same thing, but some are ready out of the box, others require effort.

Completely your choice of course, and this doesnā€™t reflect on how ā€œgoodā€ or ā€œbadā€ Clear Linux is, but just a suggestion from my own experience.

Thank you very much for your answer.

I am using the lts kernel, that is:

$ uname -r
6.6.61-1430.ltscurrent

Do you believe it should work on this kernel?

The installation seems to work, the output from ā€œnvidia-smiā€ seems fine and the output from ā€œlsmod | grep nouveauā€ is empty.

I am just having a problem that I believe is from gcc. When I try to compile the following code:

#include <stdio.h>
#include <cuda.h>

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>(); 
    return 0;
}

with ā€œnvcc hello.cu -o hello.xā€, I get a lof ot errors like:

/usr/include/stdlib.h:141:8: error: ā€˜_Float32ā€™ does not name a type; did you mean ā€˜float3ā€™?
  141 | extern _Float32 strtof32 (const char *__restrict __nptr,
      |        ^~~~~~~~
      |        float3

As I could understand, the installation script uses the gcc-11 version for the compilation.

Even trying to compile with ā€œCC=gcc-11 nvcc hello.cu -o hello.xā€ I get the same errors.

Is this a known issue?

Edit:

I found out the source of this error. gcc 11 was employed to build the drivers and CUDA, but gcc 14 was being used to compile the code. After adjusting this, the code compiles.

1 Like

Iā€™m not sure if this tip works with the latest CUDA, but something you can try is tell nvcc the specific compiler to use by making a symbolic link in the CUDA installation path.

sudo ln -sf /usr/bin/gcc-11 /opt/cuda/bin/gcc

Check your CUDA version before forcing a GCC version :wink:

cat /usr/local/cuda/version.txt

nvcc --version

gcc --version

Add the correct environment variables to your .bashrc :

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Update with : 

source ~/.bashrc

Use the CUDA samples to test your setup :

cd /usr/local/cuda/samples

make

Thank you for the hint.

I believe this link is made by your script but it seems that it does not work anymore as is. But adding the /opt/cuda/bin path as the first entry to the $PATH variable worked. Maybe an update to the documentation in your repository for this would help, since it suggests to add this path as the last entry.

The module file that I am using for this is

#%Module

proc ModuleHelp {} {
  puts stderr "\t Adds NVIDIA CUDA Toolkit 12.6 to your environment variables"
}

module-whatis "adds NVIDIA CUDA Toolkit 12.6 to your environment variables"

set CUDA                              /opt/cuda
setenv CUDA_HOME                      $CUDA
prepend-path PATH                     $CUDA/bin/
prepend-path LD_LIBRARY_PATH          $CUDA/lib64/

Edit:

I had to create the link to g++ too :).

Thank you very much for your help. The versions for nvcc and gcc that I am using are:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

$ gcc --version
gcc (Clear Linux OS for Intel Architecture) 11.5.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I could build CUDA samples and tried some of them. Everything seems to be working.

I also wrote down a module file in the answer for @marioroy .

1 Like

Thank you for helping with this. The symbolic link worked for CUDA 12.2 but no more for 12.4 and 12.6. However, the symbolic links (Iā€™ll add g++) are helpful when the the CUDA bin path comes first. Iā€™d update the documentation.

Can you share more information about the location of the module file. So, this requires the modules bundle.

sudo swupd bundle-add modules

Any tips for making this work? Do you set MODULEPATH somewhere?

Your scripts helped me a lot to install the NVIDIA driver. It was almost plug and play. I only needed to make some post installation adjustments :slight_smile:

About the environment modules, yes I am using the modules bundle.

In my environment, I have a folder in /opt/modulefiles in which I create the module files that I need. Than I just add to my .bashrc file the following line:

module use --append /opt/modulefiles/
2 Likes

That was helpful, @rfkspada. Thanks!

A mini-update to the CUDA installation creates the cuda module by the install-cuda script and placed in /usr/share/modules/modulefiles. The full path is added to the picky_whitelist in /etc/swupd/config for safety from swupd repair.

1 Like