在表 1.5.1中,我们讨论了过去二十年计算的快速增长。简而言之,自 2000 年以来,GPU 性能每十年增加 1000 倍。这提供了巨大的机会,但也表明提供这种性能的巨大需求。
在本节中,我们将开始讨论如何在您的研究中利用这种计算性能。首先是使用单个 GPU,然后是如何使用多个 GPU 和多个服务器(具有多个 GPU)。
具体来说,我们将讨论如何使用单个 NVIDIA GPU 进行计算。首先,确保您至少安装了一个 NVIDIA GPU。然后,下载NVIDIA驱动和CUDA,根据提示设置合适的路径。这些准备工作完成后,nvidia-smi
就可以通过命令查看显卡信息了。
Fri Feb 10 06:11:13 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:17.0 Off | 0 |
| N/A 35C P0 76W / 300W | 1534MiB / 16160MiB | 53% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:00:18.0 Off | 0 |
| N/A 34C P0 42W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:00:19.0 Off | 0 |
| N/A 36C P0 80W / 300W | 3308MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:00:1A.0 Off | 0 |
| N/A 35C P0 200W / 300W | 3396MiB / 16160MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... Off | 00000000:00:1B.0 Off | 0 |
| N/A 32C P0 56W / 300W | 1126MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... Off | 00000000:00:1C.0 Off | 0 |
| N/A 40C P0 84W / 300W | 1522MiB / 16160MiB | 47% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... Off | 00000000:00:1D.0 Off | 0 |
| N/A 34C P0 57W / 300W | 768MiB / 16160MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... Off | 00000000:00:1E.0 Off | 0 |
| N/A 32C P0 41W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 18049 C ...l-en-release-1/bin/python 1531MiB |
| 2 N/A N/A 41102 C ...l-en-release-1/bin/python 3305MiB |
| 3 N/A N/A 41102 C ...l-en-release-1/bin/python 3393MiB |
| 4 N/A N/A 44560 C ...l-en-release-1/bin/python 1123MiB |
| 5 N/A N/A 18049 C ...l-en-release-1/bin/python 1519MiB |
| 6 N/A N/A 44560 C ...l-en-release-1/bin/python 771MiB |
+-----------------------------------------------------------------------------+
在 PyTorch 中,每个数组都有一个设备,我们通常将其称为上下文。到目前为止,默认情况下,所有变量和相关计算都已分配给 CPU。通常,其他上下文可能是各种 GPU。当我们跨多个服务器部署作业时,事情会变得更加棘手。通过智能地将数组分配给上下文,我们可以最大限度地减少设备之间传输数据所花费的时间。例如,在带有 GPU 的服务器上训练神经网络时,我们通常更希望模型的参数存在于 GPU 上。
Fri Feb 10 08:10:21 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:17.0 Off | 0 |
| N/A 36C P0 56W / 300W | 1996MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:00:18.0 Off | 0 |
| N/A 44C P0 59W / 300W | 2000MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:00:19.0 Off | 0 |
| N/A 46C P0 59W / 300W | 1810MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:00:1A.0 Off | 0 |
| N/A 43C P0 58W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM2... Off | 00000000:00:1B.0 Off | 0 |
| N/A 37C P0 57W / 300W | 1834MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM2... Off | 00000000:00:1C.0 Off | 0 |
| N/A 49C P0 60W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM2... Off | 00000000:00:1D.0 Off | 0 |
| N/A 44C P0 59W / 300W | 1842MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM2... Off | 00000000:00:1E.0 Off | 0 |
| N/A 37C P0 57W / 300W | 1806MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 67249 C ...l-en-release-1/bin/python 1993MiB |
| 1 N/A N/A 67249 C ...l-en-release-1/bin/python 1997MiB |
| 2 N/A N/A 28134 C ...l-en-release-1/bin/python 1807MiB |
| 4 N/A N/A 75456 C ...l-en-release-1/bin/python 1831MiB |
| 6 N/A N/A 75456 C ...l-en-release-