Not strictly related to CL the OS, but more of a question of building a package on CL.
I understand there is a pytorch bundle, but AFAIK there is no CUDA support and it is not packaged standalone, and not along side anaconda.
I have tried building Pytorch from source, backed with Anaconda.
The specific steps I followed:
- Installed the Nvidia drivers using the provided guide, I added
/opt/nvidia/bin
to my$PATH
in~/.profile
, and also verified thatnvidia-smi
worked - Installed the
c-extras-gcc8
bundle, and modified my anaconda activate.d and deactivate.d, such that the appropriate gcc/g++ would be used, following the CUDA guide from clear linux - Installed CUDA from the nvidia website, after which I added
/opt/cuda/lib64
to myLD_LIBRARY_PATH
in~/.profile
, and added/opt/cuda/bin
to my$PATH
-
conda install -c pytorch magma-cuda102
to enable linalg support on the GPU - Successfully built pytorch from tag
v1.4.0
after applying patch FS#65202 : [python-pytorch-opt-cuda] incompatible nccl - Fails at GPU tensor with
THCudaCheck FAIL file=../aten/src/THC/THCGeneral.cpp line=50 error=999 : unknown error
This was puzzling, as I had managed to build and import Pytorch v1.4.0 on CentOS 7 just the past week. So my guess is that something in clear linux is causing this breakage.
Full details that I filed in the pytorch forums: Linking error in torch_shm_manager near end of compilation · Issue #34431 · pytorch/pytorch · GitHub