Enable NVIDIA GPU in Docker Container
Last updated
Last updated
When utilizing Docker containers, they leverage your host's kernel while carrying their own operating system and software packages. Consequently, they do not inherently include NVIDIA drivers required to interact with GPUs. Furthermore, Docker doesn't automatically incorporate GPUs into containers, rendering your hardware invisible with a standard docker run command.
To enable GPU functionality, the process generally involves two steps: first, installing the drivers within your Docker image, and then configuring Docker to include GPU devices in your containers during runtime.
Installing NVIDIA Drivers
Prior to proceeding with Docker configuration, ensure that the NVIDIA drivers are correctly installed and operational on your host system. Verify functionality by running nvidia-smi, where you should observe details such as your GPU's name, driver version, and CUDA version.
Add the toolkit's package repository to your system using the example command:
Next install the nvidia-docker2
package on your host:
Restart the Docker daemon to complete the installation:
Initiating Container Toolkit functionality is now complete. You're prepared to launch a test container.
Commencing a Container with GPU Access Since Docker doesn't inherently grant access to your system's GPUs, you must create containers using the --gpus flag to make your hardware visible. You can specify particular devices for activation or utilize the "all" keyword.
The nvidia/cuda images come preloaded with CUDA binaries and GPU tools. Launch a container and execute the nvidia-smi command to verify GPU accessibility. The output should mirror what you observed when running nvidia-smi on your host. Note that the CUDA version may differ depending on the toolkit versions on your host and within your chosen container image.
Choosing a Base Image
Opting for one of the nvidia/cuda tags presents the simplest and swiftest route to execute your GPU workload within Docker. These tags offer a range of variants, offering a combination of operating systems, CUDA versions, and NVIDIA software options. Moreover, these images are designed for multiple architectures.
Each tag follows this format:
11.4.0-base-ubuntu20.04 11.4.0 - CUDA version. base - Image variant. ubuntu20.04 - Operating system version. Three distinct image variants are available. The base image offers a minimal setup with essential CUDA runtime binaries. The runtime variant is more comprehensive, encompassing CUDA math libraries and NCCL for inter-GPU communication. The third variant, devel, includes everything from runtime as well as headers and development tools for crafting customized CUDA images.
If any of these images align with your requirements, it's advisable to utilize it as the base in your Dockerfile. Subsequently, you can employ regular Dockerfile instructions to install your programming languages, import your source code, and configure your application. This approach streamlines the process, eliminating the need for manual GPU setup procedures.
Constructing and executing this image with the --gpus flag will initiate your Tensor workload with GPU acceleration.
Manual Configuration of an Image
Should you opt for a different base image, you can manually incorporate CUDA support into your image. The most efficient approach is to refer to the official NVIDIA Dockerfiles.
Reproduce the instructions utilized to integrate the CUDA package repository, install the library, and establish linkage into your path. As these steps can vary depending on the CUDA version and operating system, it's crucial to follow the guidance provided in the official documentation.
Of particular importance are the environment variables at the conclusion of the Dockerfile, as these define how containers employing your image interact with the NVIDIA Container Runtime.
Once CUDA is installed and the environment variables are configured, your image should automatically detect your GPU. This approach grants you greater control over the composition of your image but may necessitate adjustments to the instructions as new CUDA versions are released.
How Does It Function?
The NVIDIA Container Toolkit comprises packages that encapsulate container runtimes like Docker, providing them with an interface to the NVIDIA driver on the host system. The libnvidia-container library is responsible for furnishing an API and CLI, which automatically expose your system's GPUs to containers via the runtime wrapper.
The nvidia-container-toolkit module implements a prestart hook for container runtimes. This implies that it is notified when a new container is set to commence. It examines the GPUs you wish to attach and invokes libnvidia-container to manage the creation of the container.
The hook is activated by nvidia-container-runtime, which wraps your primary container runtime such as containerd or runc, ensuring that the NVIDIA prestart hook is executed. Following the execution of the hook, your existing runtime proceeds with the container start process. Upon installation of the container toolkit, you will observe the NVIDIA runtime selected in your Docker daemon configuration file.
To use your GPU with Docker, begin by adding the to your host. This integrates into Docker Engine to automatically configure your containers for GPU support.