Selecting GPUs in PyTorch.

gpu pytorch

I have a weird configuration, one older GPU that is unsupported by PyTorch and one newer GPU that is supported by PyTorch. With Docker, I was able to specify the correct GPU, and it worked.

Now I am directly using PyTorch without the Docker interface, but ran into some snags specifying the GPU. This is not hard, but to lay it out super clearly to other newbies in this area, will go through it.

First, you can use nvidia-smi to view the GPUs on your system, and the proported assignment of numbers to GPUs (more on that later).

Wed Apr 14 09:28:55 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro K2000        On   | 00000000:03:00.0  On |                  N/A |
| 30%   40C    P8    N/A /  N/A |    731MiB /  1998MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:04:00.0 Off |                  N/A |
| 40%   28C    P8    15W / 260W |   2043MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1540      G   /usr/lib/xorg/Xorg                 58MiB |
|    0   N/A  N/A      1778      G   /usr/bin/gnome-shell               51MiB |
|    0   N/A  N/A      2826      G   /usr/lib/xorg/Xorg                298MiB |
|    0   N/A  N/A      2942      G   /usr/bin/gnome-shell              249MiB |
|    0   N/A  N/A      5790      G   /usr/lib/firefox/firefox            0MiB |
|    0   N/A  N/A      8840      G   /usr/lib/firefox/firefox            0MiB |
|    0   N/A  N/A      9053      G   /usr/lib/firefox/firefox           58MiB |
|    1   N/A  N/A      9750      C   /usr/bin/python3                 2039MiB |
+-----------------------------------------------------------------------------+

(Side note: Used-to me would have used a screenshot. Why not set as code, bash style? Much better for readability I think.)

So nvidia-smi is indicating that GPU 1 is the supported GPU. OK.

Instructions from various forums, ex. PyTorch say to specify the GPU from the command line, such as

CUDA_VISIBLE_DEVICES=1

which I was aware of. BUT! you actually need to do

CUDA_VISIBLE_DEVICES=1 python test.py

That environmental variable will not persist through the session unless you do an export,

export CUDA_VISIBLE_DEVICES=0

and then you can run your test code python test.py (or jupyter notebook). To make this variable stick across bash sessions, you can set it, but I’ll not go into that right now. I knew this in one part of my brain; I blame the pandemic.

Ok, so next topic, some good test code, my test.py, many similarities to Chris Albion’s.

import torch

print("Is cuda available?", torch.cuda.is_available())

print("Is cuDNN version:", torch.backends.cudnn.version())

print("cuDNN enabled? ", torch.backends.cudnn.enabled)

print("Device count?", torch.cuda.device_count())

print("Current device?", torch.cuda.current_device())

print("Device name? ", torch.cuda.get_device_name(torch.cuda.current_device()))

x = torch.rand(5, 3)
print(x)

When the numbering in nvidia-smi is wonky

Running the test code helps find where things are going weird, and I got an indication from the messages in the forum that sometimes the numbering in nvidia-smi does not match that needed in CUDA_VISIBLE_DEVICES.

In my case, the numbering in CUDA_VISIBLE_DEVICES is flipped from nvidia-smi. YES.

First try:

$ CUDA_VISIBLE_DEVICES=1 python test.py

Is cuda available? True
Is cuDNN version: 8005
cuDNN enabled?  True
Device count? 1
/home/atabb/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:81: UserWarning: 
    Found GPU0 Quadro K2000 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability that we support is 3.5.
    
  warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
Current device? 0
Device name?  Quadro K2000
tensor([[0.1904, 0.3988, 0.5989],
        [0.5658, 0.7823, 0.9238],
        [0.8135, 0.3541, 0.9398],
        [0.6298, 0.7443, 0.5831],
        [0.0502, 0.8443, 0.5911]])

Oh no, that’s the old GPU.

Second try:

$ CUDA_VISIBLE_DEVICES=0 python test.py

Is cuda available? True
Is cuDNN version: 8005
cuDNN enabled?  True
Device count? 1
Current device? 0
Device name?  GeForce RTX 2080 Ti
tensor([[0.7686, 0.0573, 0.3836],
        [0.1975, 0.9561, 0.8107],
        [0.9169, 0.3892, 0.6475],
        [0.2461, 0.6731, 0.5082],
        [0.4824, 0.3800, 0.9623]])

CUDA_VISIBLE_DEVICES=0 for me, then! If you need to specify more than one GPU, use a comma.

$ CUDA_VISIBLE_DEVICES=0,1 python test.py

Is cuda available? True
Is cuDNN version: 8005
cuDNN enabled?  True
Device count? 2
/home/atabb/.local/lib/python3.6/site-packages/torch/cuda/__init__.py:81: UserWarning: 
    Found GPU1 Quadro K2000 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability that we support is 3.5.
    
  warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
Current device? 0
Device name?  GeForce RTX 2080 Ti
tensor([[0.3730, 0.6497, 0.8243],
        [0.8727, 0.8672, 0.2272],
        [0.2291, 0.4075, 0.7580],
        [0.9754, 0.8914, 0.3489],
        [0.0208, 0.9947, 0.0432]])

© Amy Tabb 2018 - 2023. All rights reserved. The contents of this site reflect my personal perspectives and not those of any other entity.