Running Intel DPC++/C++ on Clear Linux

The message states that oneAPI DPC++/C++ Compiler is untested on Clear Linux OS. Both products are made by Intel.

From the article, here are the supported Linux Operating Systems.

I tested Intel’s C (icx) and C++ (icpx) compilers including sycl targeting the NVIDIA GPU.

Although the oneAPI CUDA plugin prefers CUDA 12.x, I installed CUDA 11.8 because this one works with Numba and Codon.

$ ONEAPI_DEVICE_SELECTOR="cuda:*" SYCL_PI_TRACE=1 ./simple-sycl-app
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 12.27.1 ]
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA GeForce RTX 3070
The results are correct!

No changes are needed to the CUDA installation described here.

Any performance gains with icx/icpx ?

I came across this post because I want to start getting into AI work and use NVIDIA GPU. I will only be SSHing into this box where I would like to setup Clear Linux because I believe it is the fastest and not have a GUI install at all.

Do I still need to install the NVIDIA driver if I just want to use it for AI work, i.e. bytes in and bytes out, and no UI use-case per se.

I think I do, but I thought I’d ask.

Yes. The answer is found here.

  1. “To build an application, a developer has to install only the CUDA Toolkit and necessary libraries required for linking.”
  2. “In order to run a CUDA application, the system should have a CUDA enabled GPU and an NVIDIA display driver that is compatible with the CUDA Toolkit that was used to build the application itself.”

Intel® oneAPI DPC++/C++ Compiler, v. 2024.0.2 (download the Stand-Alone version).

Pre-requisite: Install the c-basic bundle including devpkg-boost.

sudo swupd bundle-add c-basic devpkg-boost

Install oneAPI DPC++/C++ Compiler.

bash l_dpcpp-cpp-compiler_p_2024.0.2.29_offline.sh

Interestingly, the installer is unable to detect g++ via the c-basic bundle or that g++ is in our path or exists in /usr/bin.

The out-of-the-box experience is broken on Clear Linux using OpenMP technology, via the -fopenmp C/C++ option. Perform the following steps to fix OpenMP (cannot find omp.h header). Note: Alter path accordingly for future versions.

cd ~/intel/oneapi/compiler/2024.0/lib/clang/17/include
ln -s ../../../../opt/compiler/include/omp.h .
ln -s ../../../../opt/compiler/include/omp_lib.h .
ln -s ../../../../opt/compiler/include/omp-tools.h .

Clear Linux no longer provides libiomp5.so, since CL 39970. Copy the OpenMP runtime library to /usr/local/lib64/.

cd ~/intel/oneapi/compiler/2024.0/lib
sudo mkdir -p /usr/local/lib64
sudo cp -a libiomp5.so /usr/local/lib64/.
sudo chmod 755 /usr/local/lib64/libiomp5.so
cd ~

Update the dynamic linker to search /usr/local paths. Omit step if you have this, already.

sudo mkdir -p /etc/ld.so.conf.d

sudo tee "/etc/ld.so.conf.d/local.conf" >/dev/null <<'EOF'
/usr/local/lib64
/usr/local/lib
EOF

if ! grep -q '^include /etc/ld\.so\.conf\.d/\*\.conf$' /etc/ld.so.conf 2>/dev/null
then
  sudo tee --append "/etc/ld.so.conf" >/dev/null <<'EOF'
include /etc/ld.so.conf.d/*.conf
EOF
fi

Refresh the dynamic linker run-time cache.

sudo ldconfig

On non-Intel systems, the C/C++ -axCODE option does not work. Instead, use the -xHost option for best performance. “That tells the compiler to generate instructions for the highest instruction set available on the compilation host processor.”

. ~/intel/oneapi/setvars.sh --include-intel-llvm

clang++ -O3 -xHost ...
clang++ -O3 -xhaswell
clang++ -O3 -xcore-avx2 ...
clang++ -O3 -xskylake-avx512 ...
clang++ -O3 -xx86-64-v2  (or v3, v4) ...
clang++ -O3 -xcommon-avx512 ...
1 Like

oneAPI for NVIDIA® GPUs, v. 2024.0.1 (click on the Download icon at the top of the page).

Pre-requisite: Install CUDA 12.2.

bash ./install-cuda 12.2

Install oneAPI DPC++ CUDA® plugin.

bash ./oneapi-for-nvidia-gpus-2024.0.1-cuda-12.0-linux.sh

There is also oneAPI for AMD GPUs, to target AMD GPUs.

1 Like

oneAPI Monte Carlo Pi Demonstration

Building a binary targeting the CPU (spir64) and GPU (nvptx64-nvidia-cuda).

unset CFLAGS
unset CXXFLAGS
unset FCFLAGS
unset FFLAGS

. ~/intel/oneapi/setvars.sh --include-intel-llvm

icpx -fsycl -fsycl-targets=spir64,nvptx64-nvidia-cuda -O2 -g -DNDEBUG monte_carlo_pi.cpp -o monte_carlo_pi -lOpenCL -lsycl

By default, the binary runs on the GPU if both CPU and GPU targets are built.

./monte_carlo_pi
ONEAPI_DEVICE_SELECTOR="ext_oneapi_cuda:*" ./monte_carlo_pi
ONEAPI_DEVICE_SELECTOR="opencl:cpu" ./monte_carlo_pi

Some success after all. :slight_smile:

ONEAPI_DEVICE_SELECTOR="ext_oneapi_cuda:*" ./monte_carlo_pi 
Calculating estimated value of pi...

Running on NVIDIA GeForce RTX 3070
The estimated value of pi (N = 10000) is: 3.1388

Computation complete. The processing time was 0.0536164 seconds.
The simulation plot graph has been written to 'MonteCarloPi.bmp'
1 Like