What drivers should I use for CUDA in ClearLinux

rfkspada · October 31, 2024, 8:04pm

Hi all,

I just bought a new server with a NVIDIA RTX 4060 so I can start to develop something for HPC using CUDA.

My main question is if I need to install the proprietary NVIDIA drivers or the nouveau drivers has support for CUDA on CL.

Also, is this tutorial for installation of the NVIDIA drivers updated?

https://www.clearlinux.org/clear-linux-documentation/zh_CN/tutorials/nvidia.html

And Finally, I found this topic

https://community.clearlinux.org/t/install-nvidia-drivers-on-clear-linux-os-server/7431

stating that there are issues with NVIDIA drivers and swupd. Are these issues solved?

Best,

Businux · November 1, 2024, 12:24am

Have you looked at @marioroy’s :

It’s archived and not working with the 6.12 kernel, but might be of interest to get an idea of setup requirements etc.

drtitus · November 1, 2024, 1:01am

I think you need to add CUDA as an extra, at least that’s how it works in other distros.

Go ahead and try with Clear Linux, but I’d suggest setting aside a small partition (32GB minimum plus enough for your own data) and installing a distro that CUDA supports natively (Ubuntu or Mint are what I use) rather than dealing with everything from source or “manual installs”. Which is not to say it can’t be done, but I believe in choosing the right tool for the job, rather than being stubbornly dedicated to a particular build “just because”.

I’m not saying it can’t be done, but it’s much easier to “apt install nvidia-cuda-toolkit” and be done with it rather than jumping through hoops. It’s the same reason people stopped using Slackware or other source based distros versus using popular, tested distros. Technically they can all do the same thing, but some are ready out of the box, others require effort.

Completely your choice of course, and this doesn’t reflect on how “good” or “bad” Clear Linux is, but just a suggestion from my own experience.

rfkspada · November 25, 2024, 1:09pm

Thank you very much for your answer.

I am using the lts kernel, that is:

$ uname -r
6.6.61-1430.ltscurrent

Do you believe it should work on this kernel?

The installation seems to work, the output from “nvidia-smi” seems fine and the output from “lsmod | grep nouveau” is empty.

I am just having a problem that I believe is from gcc. When I try to compile the following code:

#include <stdio.h>
#include <cuda.h>

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>(); 
    return 0;
}

with “nvcc hello.cu -o hello.x”, I get a lof ot errors like:

/usr/include/stdlib.h:141:8: error: ‘_Float32’ does not name a type; did you mean ‘float3’?
  141 | extern _Float32 strtof32 (const char *__restrict __nptr,
      |        ^~~~~~~~
      |        float3

As I could understand, the installation script uses the gcc-11 version for the compilation.

Even trying to compile with “CC=gcc-11 nvcc hello.cu -o hello.x” I get the same errors.

Is this a known issue?

Edit:

I found out the source of this error. gcc 11 was employed to build the drivers and CUDA, but gcc 14 was being used to compile the code. After adjusting this, the code compiles.

marioroy · November 25, 2024, 7:32pm

I’m not sure if this tip works with the latest CUDA, but something you can try is tell nvcc the specific compiler to use by making a symbolic link in the CUDA installation path.

sudo ln -sf /usr/bin/gcc-11 /opt/cuda/bin/gcc

Businux · November 25, 2024, 7:37pm

Check your CUDA version before forcing a GCC version

cat /usr/local/cuda/version.txt

nvcc --version

gcc --version

Add the correct environment variables to your .bashrc :

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Update with : 

source ~/.bashrc

Use the CUDA samples to test your setup :

cd /usr/local/cuda/samples

make

rfkspada · November 26, 2024, 2:24pm

Thank you for the hint.

I believe this link is made by your script but it seems that it does not work anymore as is. But adding the /opt/cuda/bin path as the first entry to the $PATH variable worked. Maybe an update to the documentation in your repository for this would help, since it suggests to add this path as the last entry.

The module file that I am using for this is

#%Module

proc ModuleHelp {} {
  puts stderr "\t Adds NVIDIA CUDA Toolkit 12.6 to your environment variables"
}

module-whatis "adds NVIDIA CUDA Toolkit 12.6 to your environment variables"

set CUDA                              /opt/cuda
setenv CUDA_HOME                      $CUDA
prepend-path PATH                     $CUDA/bin/
prepend-path LD_LIBRARY_PATH          $CUDA/lib64/

Edit:

I had to create the link to g++ too :).

rfkspada · November 26, 2024, 5:03pm

Thank you very much for your help. The versions for nvcc and gcc that I am using are:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0

$ gcc --version
gcc (Clear Linux OS for Intel Architecture) 11.5.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I could build CUDA samples and tried some of them. Everything seems to be working.

I also wrote down a module file in the answer for @marioroy .

marioroy · November 26, 2024, 6:31pm

Thank you for helping with this. The symbolic link worked for CUDA 12.2 but no more for 12.4 and 12.6. However, the symbolic links (I’ll add g++) are helpful when the the CUDA bin path comes first. I’d update the documentation.

Can you share more information about the location of the module file. So, this requires the modules bundle.

sudo swupd bundle-add modules

Any tips for making this work? Do you set MODULEPATH somewhere?

rfkspada · November 26, 2024, 6:52pm

Your scripts helped me a lot to install the NVIDIA driver. It was almost plug and play. I only needed to make some post installation adjustments

About the environment modules, yes I am using the modules bundle.

In my environment, I have a folder in /opt/modulefiles in which I create the module files that I need. Than I just add to my .bashrc file the following line:

module use --append /opt/modulefiles/

marioroy · November 27, 2024, 10:48am

That was helpful, @rfkspada. Thanks!

A mini-update to the CUDA installation creates the cuda module by the install-cuda script and placed in /usr/share/modules/modulefiles. The full path is added to the picky_whitelist in /etc/swupd/config for safety from swupd repair.

Topic		Replies	Views
'Close to upstream' and 'Intel Optimized' Clear Linux distro does not give the option to install proprietary NVIDIA driver like Arch or Ubuntu! General Discussion	29	1510	March 10, 2023
Integrating NVIDIA Proprietary Graphics Driver into Clear Linux ISO Q&A	22	2712	August 22, 2023
Include Nvidia Open Source Kernel Module in Clear Linux Developer Discussion	1	78	January 23, 2025
NVIDIA GPU drivers for Clear Linux Q&A	1	694	January 30, 2021
Problems with instructions for installing cuda General Discussion	7	763	April 20, 2021

What drivers should I use for CUDA in ClearLinux

Related topics