解決 NVIDIA GPU 閒置時耗電問題
發現顯示卡在沒有使用的狀態時,照理來說不會耗電,卻吃了約 120 瓦的電。
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 58C P0 120W / 350W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 0% 58C P0 126W / 350W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
而當我執行程式將模型載入到 GPU 上面時,使用的瓦數會降到不到 20 瓦,顯然上面 GPU 閒置時吃的瓦數不正常
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 0% 55C P8 11W / 350W | 2508MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 0% 51C P8 16W / 350W | 2468MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 405385 C 2505MiB |
| 1 N/A N/A 405385 C 2465MiB |
+-----------------------------------------------------------------------------+
後來發現如果將 Persistence mode 調整為 on 之後會變正常,不會吃那麼多
Enabled persistence mode for GPU 00000000:01:00.0.
Enabled persistence mode for GPU 00000000:02:00.0.
All done.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 0% 47C P8 23W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| 0% 36C P8 15W / 350W | 1MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
這時發現Perf這個欄位的值,從原本的P0變成了P8,查一下這兩個的差別
以下節錄自 nvidia 文件:
The GPU performance state APIs are used to get and set various performance levels on a per-GPU basis. P-States are GPU active/executing performance capability and power consumption states.
P-States range from P0 to P15, with P0 being the highest performance/power state, and P15 being the lowest performance/power state. Each P-State maps to a performance level. Not all P-States are available on a given system. The definition of each P-States are currently as follows:
- P0/P1 - Maximum 3D performance
- P2/P3 - Balanced 3D performance-power
- P8 - Basic HD video playback
- P10 - DVD playback
- P12 - Minimum idle power consumption
可見一開始的P0代表 GPU 處於最大效能的狀態,難怪會吃了 120 多瓦,而後來打開 Persistence mode 後變成的P8則是基礎的狀態,只會吃 20 多瓦。
所以可以理解成 GPU 在閒置時狀態會變成P0,消耗大量電力,要打開 Persistence mode 不讓 GPU 閒置,這樣就不會變成P0。至於為什麼閒置時 GPU 會切換成最大效能的原因還有待研究。