Introduction to PyTorch#
PyTorch is one of the preeminent machine-learning and optimization libraries currently available. It contains a number of powerful features that drastically simplify the task of fitting models and training neural networks. While we won’t have time in this tutorial to examine more than a few of the core features, there are many additional tutorials available online. This tutorial will be roughly similar to a few of the introductory PyTorch tutorials available at pytorch.org/tutorials.
PyTorch has many features and utilities, but at its core there are just a few pieces that contribute most to its impact on the AI/ML community:
Tensor Operations. PyTorch provides most of the same numerical processing utilities as the NumPy library (and, in fact, the core interface of PyTorch is very similar to NumPy’s, as we will see later in this lesson).
GPU Support. PyTorch allows one to write Python code that can be easily run on either a GPU, if available, or on the CPU.
Automatic Differentiation. A substantial portion of the field of numerical optimization is based on the use of the derivatives and gradients of a function to find its minimum, i.e. “gradient descent” optimization. Efficiently calculating the gradient of an arbitrary function that is associated with a model or a machine learning model (such as a neural network) can be very difficult, however. PyTorch performs this calculation automatically for you, making many kinds of optimizations much easier.
Tools for Model Training. Finally, PyTorch includes a variety of optimization tools and data management utilities for use in training models from data. These tools include classes that implement the computations of neural networks and convolutional neural networks.
In this lesson, we will take a look at each of these features and work through some examples of nonlinear models using the California Housing Dataset. In the next lesson, Introduction to Neural Networks, we will look at neural networks and convolutional neural networks in PyTorch, specifically.
Installing PyTorch#
If you’re using the Docker setup provided with the AI ABCs GitHub repository then PyTorch should already be installed for you; however, you will not be able to execute PyTorch code on a GPU using the Docker image. (You do not need to be able to use a GPU to do anything in this course.)
To install PyTorch locally, it is strongly recommended that you use an environment manager like conda. PyTorch can be installed using conda via the command conda install -c pytorch pytorch. It can be installed using pip via the command pip install torch. The library itself is called torch.
Additionally, the website pytorch.org maintains a get-started page that contains installation instructions.
Basic Operations in PyTorch#
To start using PyTorch, we will first need to import it. The library is called torch in Python.
import torch
# We'll also want numpy to compare to:
import numpy as np
torch.__version__
'2.9.0+cpu'
PyTorch’s Tensor is a lot like NumPy’s ndarray.#
At first glance, PyTorch appears to be somewhat like NumPy in that it gives the user a set of classes and functions for interacting with a Tensor type that behaves much like NumPy’s ndarray type. Both NumPy and PyTorch, for example, define functions like log, sin, and mean that work with their respective array type. However, the Tensor and ndarray objects aren’t interchangeable because PyTorch Tensors are intended for use in optimization problems and thus potentially keep track of extra data. These data are critical for performing efficient gradient-descent parameter-tuning, which is generally required for optimization such as training neural networks.
# Create a PyTorch Tensor object (like a numpy.ndarray object):
tens = torch.tensor([1.0, 2.5, 4.0])
print(type(tens))
tens
<class 'torch.Tensor'>
tensor([1.0000, 2.5000, 4.0000])
# Tensors support basic arithmatic and comparison operations, like arrays:
print('+', tens + tens)
print('/', tens / tens)
print('==', tens == tens)
+ tensor([2., 5., 8.])
/ tensor([1., 1., 1.])
== tensor([True, True, True])
# PyTorch includes many of the same basic numerical functions as NumPy too:
print('exp', torch.exp(tens))
print('mean', torch.mean(tens))
print('sum', torch.sum(tens))
exp tensor([ 2.7183, 12.1825, 54.5981])
mean tensor(2.5000)
sum tensor(7.5000)
# Tensors can have many dimensions, just like NumPy arrays:
mtx = torch.reshape(torch.linspace(0, 1, 6), (2, 3))
mtx
tensor([[0.0000, 0.2000, 0.4000],
[0.6000, 0.8000, 1.0000]])
# Accumulation functions like sum and mean typically take axis options, just
# like in NumPy. PyTorch likes to call this 'dim' instead of 'axis', but it
# typically accepts either version:
print('axis=0', torch.sum(mtx, axis=0))
print('dim=1', torch.sum(mtx, dim=1))
axis=0 tensor([0.6000, 1.0000, 1.4000])
dim=1 tensor([0.6000, 2.4000])
# Tensors have shapes and dtypes, but unlike NumPy arrays, Tensors only
# support a specific set of PyTorch numeric dtypes and are accessed in ways
# similar to NumPy arrays.
print('shape:', mtx.shape)
print('dtype:', mtx.dtype)
print('column:', mtx[:, 1])
shape: torch.Size([2, 3])
dtype: torch.float32
column: tensor([0.2000, 0.8000])
Differences between Tensor and ndarray.#
Despite the broad similarities between Tensor and ndarray, there are a number of differences as well. First of all, Tensors can only store numerical data—they cannot store strings or Python objects the way NumPy arrays can. Additionally, the Tensor type has a device parameter that can be provided to the torch.tensor function and a number of similar functions that return new PyTorch tensors. The device parameter is necessary because it is often necessary to have numerical data in certain memory buffers in order for them to be processed by a peripheral processing unit like a GPU. In such a case, the device is typically set to something like device='cuda', but this will depend on the system. The value device='cpu' can be used to allocate tensors explicitly for the CPU. Finally, Tensors have a parameter requiers_grad that is used in gradient computations; we will discuss both of these options in upcoming sections of this lesson.
# PyTorch tensors can't be made of strings:
x = torch.tensor(['a', 'b', 'c']) # This will error!
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[8], line 2
1 # PyTorch tensors can't be made of strings:
----> 2 x = torch.tensor(['a', 'b', 'c']) # This will error!
ValueError: too many dimensions 'str'
# By default the device of a tensor will be the cpu and requires_grad will be
# False.
print('device:', tens.device)
print('requires_grad:', tens.requires_grad)
device: cpu
requires_grad: False
PyTorch / NumPy Interface#
Although PyTorch has a very similar interface to NumPy and even uses NumPy arrays under the hood, most of its functions aren’t compatible with NumPy arrays. In fact, most PyTorch functions don’t work unless their arguments are tensors. If you are used to NumPy’s broadly permissive approach to its arguments—for example, np.sum([1,2]) is perfectly valid, even though the argument is a list of numbers rather than a NumPy array—then PyTorch’s approach may feel initially very strict.
# This will raise an exception because the argument is not a Tensor.
torch.sum([1,2,3])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 2
1 # This will raise an exception because the argument is not a Tensor.
----> 2 torch.sum([1,2,3])
TypeError: sum(): argument 'input' (position 1) must be Tensor, not list
# Similarly, NumPy arrays aren't PyTorch tensors:
torch.sum(np.array([1,2,3]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], line 2
1 # Similarly, NumPy arrays aren't PyTorch tensors:
----> 2 torch.sum(np.array([1,2,3]))
TypeError: sum(): argument 'input' (position 1) must be Tensor, not numpy.ndarray
# Tensors won't sum with lists (arrays will), so this will also error:
tens + [1,2,3]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[12], line 2
1 # Tensors won't sum with lists (arrays will), so this will also error:
----> 2 tens + [1,2,3]
TypeError: unsupported operand type(s) for +: 'Tensor' and 'list'
# Because Tensors are iterables, NumPy arrays will add with Tensors (returning
# new array objects), but the preferred way of doing this either cast the
# array to a tensor or vice versa so that it is clear which type is the result
# should be).
# Example NumPy array:
arr = np.array([1.0, 2.0, 3.0])
print('Cast array to tensor:', tens + torch.from_numpy(arr))
print('Cast tensor to array:', tens.numpy() + arr)
Cast array to tensor: tensor([2.0000, 4.5000, 7.0000], dtype=torch.float64)
Cast tensor to array: [2. 4.5 7. ]
PyTorch and GPUs#
One great part of PyTorch is that it can flexibly be used with either the CPU or peripheral processors like GPUs. (Configuring PyTorch to use GPUs is beyond the scope of this course; we recommend PyTorch’s getting started page for help with GPU configuration.) If an operation or model runs on the CPU in PyTorch, you can be fairly confident that it will run on a GPU once PyTorch has been configured to use that GPU. GPUs can substantially speed up the training of many ML algorithms, especially the neural networks and convolutional neural networks that we will see in the next lesson.
PyTorch’s Tensors include a method to that can be used to move a tensor from one device (like the CPU) to another (like a GPU). Typically this would look something like the following:
tens_cuda = tens.to('cuda')
# Or:
tens_cuda = tens.cuda()
If you’re on a system with multiple GPUs, then 'cuda:0' refers to the first GPU, 'cuda:1' refers to the second GPU, etc., with 'cuda' by itself referring to the current selected GPU, which can be set using the function torch.cuda.set_device (e.g., torch.cuda.set_device(0) for 'cuda:0'). To find out how many GPUs are available on a given system, you can run torch.cuda.device_count().
# This will return 0 if there are no GPUs available.
torch.cuda.device_count()
0
# The function torch.cuda.is_available is basically a synonym for
# (torch.cuda.device_count() > 0):
torch.cuda.is_available()
False