跳轉到

在 Ubuntu 20.04 設置深度學習環境

Environment

  • OS: Ubuntu 20.04
  • GPU: GeForce RTX 3090

安裝 nvidia driver

列出系統建議的driver版本並安裝

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common
ubuntu-drivers devices

terminal output:

== /sys/devices/pci0000:00/0000:00:03.2/0000:0d:00.0 ==
modalias : pci:v000010DEd00002204sv00001462sd00003881bc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-driver-460 - distro non-free recommended
driver   : nvidia-driver-460-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin
sudo apt install nvidia-driver-460
sudo reboot

安裝完成後重開機,用nvidia-smi指令確認是否安裝完成

nvidia-smi

terminal output:

Thu Mar  4 19:22:19 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:0C:00.0 Off |                  N/A |
| 48%   59C    P0   118W / 350W |      0MiB / 24260MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 3090    Off  | 00000000:0D:00.0 Off |                  N/A |
|  0%   52C    P0    51W / 350W |      0MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

這邊有遇到之前在Ubuntu18安裝時沒出現的問題,就是在driver安裝完重開機後會自己出現桌面(安裝的是server版Ubuntu)。推測原因可能是新的driver裡面多包含了桌面套件導致,而目前的解決方法是解除安裝gdm3套件。

sudo apt remove gdm3

後來在網路上查了一下,雖然還沒機會實際試過,但一開始使用 sudo apt install nvidia-driver-460 安裝driver時將指令改成 sudo apt install --no-install-recommends nvidia-driver-460 也許就不會安裝到桌面套件了。

安裝 CUDA

https://developer.nvidia.com/cuda-downloads

這邊選的installer type是network(之前選local下載的速度非常慢)

選項選好後照著底下產生的指令執行就好

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

~/.bashrc 檔案內加上以下幾行

~/.bashrc
# cuda
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

重新載入

source ~/.bashrc

測試是否安裝完成

nvcc -V

output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Jan_28_19:32:09_PST_2021
Cuda compilation tools, release 11.2, V11.2.142
Build cuda_11.2.r11.2/compiler.29558016_0

官方文件 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

安裝 cuDNN

定義版本

cudnn_version=8.1.0.77
cuda_version=cuda11.2
OS=ubuntu2004

Enable the repository

wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin 

sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
sudo apt-get update

Install the cuDNN library

sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}

官方文件 https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html

安裝 Docker

Update the apt package index and install packages to allow apt to use a repository over HTTPS

sudo apt-get update
sudo apt-get install \
  apt-transport-https \
  ca-certificates \
  curl \
  gnupg

Add Docker’s official GPG key

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Use the following command to set up the stable repository

echo \
  "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install docker engine

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Manage Docker as a non-root user

sudo usermod -aG docker $USER

Verify that you can run docker commands without sudo

docker run hello-world

官方文件 https://docs.docker.com/engine/install/ubuntu/

安裝 nvidia docker

安裝 nvidia docker 讓container能使用GPU

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt install -y nvidia-container-toolkit
sudo reboot

官方文件 https://nvidia.github.io/nvidia-docker/ https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(Native-GPU-Support)