在 Ubuntu 20.04 設置深度學習環境
Environment
- OS: Ubuntu 20.04
- GPU: GeForce RTX 3090
安裝 nvidia driver
列出系統建議的driver版本並安裝
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo apt install ubuntu-drivers-common
terminal output:
== /sys/devices/pci0000:00/0000:00:03.2/0000:0d:00.0 ==
modalias : pci:v000010DEd00002204sv00001462sd00003881bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-460 - distro non-free recommended
driver : nvidia-driver-460-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
安裝完成後重開機,用nvidia-smi指令確認是否安裝完成
terminal output:
Thu Mar 4 19:22:19 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39 Driver Version: 460.39 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:0C:00.0 Off | N/A |
| 48% 59C P0 118W / 350W | 0MiB / 24260MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:0D:00.0 Off | N/A |
| 0% 52C P0 51W / 350W | 0MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
這邊有遇到之前在Ubuntu18安裝時沒出現的問題,就是在driver安裝完重開機後會自己出現桌面(安裝的是server版Ubuntu)。推測原因可能是新的driver裡面多包含了桌面套件導致,而目前的解決方法是解除安裝gdm3套件。
後來在網路上查了一下,雖然還沒機會實際試過,但一開始使用
sudo apt install nvidia-driver-460安裝driver時將指令改成sudo apt install --no-install-recommends nvidia-driver-460也許就不會安裝到桌面套件了。
安裝 CUDA
https://developer.nvidia.com/cuda-downloads
這邊選的installer type是network(之前選local下載的速度非常慢)

選項選好後照著底下產生的指令執行就好
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
在 ~/.bashrc 檔案內加上以下幾行
# cuda
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
重新載入
測試是否安裝完成
output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Jan_28_19:32:09_PST_2021
Cuda compilation tools, release 11.2, V11.2.142
Build cuda_11.2.r11.2/compiler.29558016_0
官方文件 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
安裝 cuDNN
定義版本
Enable the repository
wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin
sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
sudo apt-get update
Install the cuDNN library
sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
官方文件 https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html
安裝 Docker
Update the apt package index and install packages to allow apt to use a repository over HTTPS
Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
Use the following command to set up the stable repository
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install docker engine
Manage Docker as a non-root user
Verify that you can run docker commands without sudo
安裝 nvidia docker
安裝 nvidia docker 讓container能使用GPU
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
官方文件 https://nvidia.github.io/nvidia-docker/ https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(Native-GPU-Support)