Docker with NVIDIA driver

cmack5644 · December 10, 2019, 7:06am

I am having an issue with setting up a docker container with the nvidia driver. I followed the instructions to set up the NVIDIA driver and it appears to be working fine. I’ve installed the CUDA toolkit, though this is unnecessary with the newest versions of Docker. I run the following:

docker run --gpus all nvidia/cuda -base nvidia-smi

But get the following error:
docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]]

Has anyone been ran into/resolved this issue?

Here are the outputs that I believe show the driver is installed correctly:

ls -l /dev/nvidia*
crw-rw-rw- 1 root root 195, 0 Dec 8 18:49 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Dec 8 18:49 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Dec 8 18:49 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Dec 8 18:49 /dev/nvidia-modeset

lsmod | grep ^nvidia
nvidia_drm 45056 9
nvidia_modeset 1114112 12 nvidia_drm
nvidia 19931136 617 nvidia_modeset
nvidiafb 53248 0

doct0rHu · December 10, 2019, 2:13pm

Did you disable modprobe when you install the NVIDIA driver?

cmack5644 · December 10, 2019, 6:14pm

I did. I thought that was necessary to make it work?

I followed the official guide on the clearlinux site exactly

doct0rHu · December 10, 2019, 6:26pm

Try not disabling modprobe. It causes problems with CUDA modules.

puneetse · December 10, 2019, 8:44pm

Cool! I didn’t know this so thanks for mentioning it.

This issue in their GitHub suggests that means part of the prerequisites is missing (installing the alternative nvidia-container-toolkit ). It looks like the provide they provide apt/yum repos so you’ll have to compile it by source or dig into their packaging tools.

Another thing to check for is the default docker runtime. Some bundles in Clear Linux current add helpers to default to kata containers which could be getting in the way:

$ sudo docker info | grep "Default Runtime"
Default Runtime: runc

puneetse · December 10, 2019, 8:51pm

The job of nvidia-modprobe is to load the nvidia driver and create the /dev/nvidia* character devices if they don’t already exist, like it tends to be the case on headless systems. Based on @cmack5644 output it looks like those character devices are already there so I don’t think it is the problem in this case. (I could be wrong!)

nvidia-modprobe is a relatively simple helper that could be done a variety of other ways. If I remember right, the CUDA toolkit just happens to rely on nvidia-modprobed specifically. But since CUDA isn’t required on the host anymore it should be a non-issue.

doct0rHu · December 10, 2019, 8:52pm

Okay. This makes sense.

Topic		Replies	Views
How to Install NVIDIA Docker for Multi-GPU Support for h2oGPT Usage on Clear Linux Q&A	1	423	August 24, 2023
No files in dev/nvidia when installing CUDA Q&A	1	590	July 1, 2021
What drivers should I use for CUDA in ClearLinux Developer Discussion	10	264	November 27, 2024
How to install nvidia drivers on clear Linux (GTX 1660ti)? Q&A	9	2568	November 3, 2020
NVIDIA drivers with CUDA 10.1 on AWS EC2? Developer Discussion	14	5353	October 29, 2019

Docker with NVIDIA driver

Related topics