20 PyTorch Concepts, Explained Simply

Understand PyTorch well and supercharge your AI engineering skills today.

Nov 07, 2025

∙ Paid

PyTorch is one of the most important and most popular deep learning frameworks today.

It was built on top of the Lua-based Torch library at Meta and open-sourced in 2017.

Since its release, the library has been used to build almost every important modern AI innovation, from Tesla’s self-driving cars to OpenAI’s ChatGPT.

Let’s learn the 20 most important concepts to improve our understanding of PyTorch from the ground up.

Before we start, I want to introduce you to my book ‘LLMs In 100 Images’.

It is a collection of 100 easy-to-follow visuals that explain the most important concepts you need to master to understand LLMs today.

Grab your copy today at a special early bird discount using this link.

1. Tensor

A Tensor is the core data structure and computational unit of PyTorch.

PyTorch tensors are multi-dimensional arrays that contain values of the same data type. They can be thought of as a NumPy array, but highly optimized and GPU-compatible.

A Tensor is created from a Python list as follows:

import torch

# Create tensor from a list
x = tensor.tensor([1,2,3])

print(x)
#tensor([1, 2, 3])

An uninitialized Tensor is created using torch.empty(shape)

# Create a 3x2 tensor with uninitialized values
x = torch.empty(3,2)

Tensors initialized with zeros or ones are created using torch.zeros(shape) or torch.ones(shape)

# Create a 3x2 tensor with zeros
x = torch.zeros(3, 2)

print(x)

“”“
tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])
“”“

# Create a 3x2 tensor with ones
x = torch.ones(3, 2)

print(x)

“”“
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

“”“

Tensors initialized with random values from a uniform distribution on the interval of 0 to 1 (not including 1) are created using torch.rand(shape)

# Create a 2x2 tensor with random values from the uniform distribution
x = torch.rand(2,2)

print(x)

“”“
tensor([[0.6051, 0.0569],
        [0.7959, 0.0452]])
“”“

Tensors initialized with random numbers from a normal distribution with mean 0 and variance 1 are created using torch.randn(shape)

# Create a 2x2 tensor with random values from the standard normal distribution
x = torch.randn(2,2)

print(x)

“”“
tensor([[ 0.7346, -0.3198],
        [ 0.9044,  0.0995]])
“”“

torch.arange(start, end, step) is used to create a 1-D tensor with evenly spaced values within a given interval

# Create a 1-D tensor
x = torch.arange(0, 10, 2)

print(x)
# tensor([0, 2, 4, 6, 8])

2. Arithmetic operations on Tensors

Element-wise arithmetic operations on tensors are performed as follows.

x = torch.tensor([3, 2, 1])
y = torch.tensor([1, 2, 3])

# Element-wise arithmetic operations
add = x + y         # Addition
subtract = x - y    # Subtraction
multiply = x * y    # Multiplication
divide = x / y      # Division

All of the above operations create new tensors.

In-place operations (that modify the original tensor) are performed using the following methods with the _ suffix, as shown below.

# In-place operations (modifies x)
x = torch.tensor([10, 20, 30], dtype=torch.float32)

x.add_(2)    
x.sub_(3)    
x.mul_(5)   
x.div_(2)

3. Broadcasting

Broadcasting allows PyTorch to perform operations on tensors of different shapes.

This is done by automatically expanding the smaller tensor to match the larger one, without actually copying/ duplicating its data in memory.

# Without broadcasting (manual)
a = torch.tensor([1, 2, 3])
b = torch.tensor([10])

b_repeated = b.repeat(3) # [10, 10, 10]
result = a + b_repeated  # [11, 12, 13]

# With broadcasting (automatic)
a = torch.tensor([1, 2, 3])
b = torch.tensor([10])

result = a + b  # [11, 12, 13]

4. Reshaping Tensors

reshape and view are two important methods used to reshape tensors in PyTorch.

view returns a reshaped tensor but requires the original tensor to be contiguous in memory.

# Create tensor
x = torch.randn(2, 3, 4)

# Check if the tensor is contiguous in memory
print(x.is_contiguous()) # True 

# Reshape using view
y = x.view(6, 4)

# Transpose the tensor, which returns a non-contiguous tensor
x_t = x.transpose(1, 2)

print(x_t.is_contiguous())  # False

# Reshape using view
y_2 = x_t.view(6, 4) #Throws an error

reshape returns a reshaped tensor even if the original tensor is non-contiguous in memory

# Reshape using ‘reshape’
y_2 = x_t.reshape(6, 4)

Apart from these, there are other methods that you will commonly encounter being used in PyTorch code. These are as follows:

unsqueeze and squeeze are used to add or remove a dimension of size 1 to a tensor, respectively.

# Create tensor of shape (3, 4)
x = torch.randn(3, 4)

# Add a new dimension at the front (dim=0)
x_unsq0 = x.unsqueeze(0)
# shape: (1, 3, 4)

# Add a new dimension in the middle (dim=1)
x_unsq1 = x.unsqueeze(1)
# shape: (3, 1, 4)

# Add a new dimension at the end (dim=2)
x_unsq2 = x.unsqueeze(2)
# shape: (3, 4, 1)

# Create a tensor
x = torch.randn(1, 3, 1, 4, 1)

# Remove all size-1 dimensions
x_sq_all = x.squeeze()
# shape: (3, 4)

# Remove only a specific dimension of size 1
x_sq0 = x.squeeze(0)   # remove dim 0 → shape: (3, 1, 4, 1)
x_sq2 = x.squeeze(2)   # remove dim 2 → shape: (1, 3, 4, 1)

flatten is used to turn a tensor or its selected dimensions into 1-dimension, while unflatten splits a dimension into multiple dimensions.

# Create a tensor
x = torch.randn(2, 3, 4)

# Flatten into 1-D
y = x.flatten() 
# shape: (24,)

# Flatten into 1-D starting from dim 1
y = x.flatten(start_dim=1)   # shape: (2, 12)

# Create a 1-D tensor
x = torch.arange(24)
# shape: (24,)

# Unflatten the first dimension into (2, 3, 4)
y = torch.unflatten(x, dim=0, sizes=(2, 3, 4))
# shape: (2, 3, 4)

transpose swaps two specified dimensions of a tensor while permute reorders all dimensions of a tensor in any order.

# Create a tensor
x = torch.randn(2, 3, 4) # shape: (2, 3, 4)

# Transpose dim 1 and dim 2
y = x.transpose(1, 2) # shape: (2, 4, 3)

# Create a 2D tensor
m = torch.randn(3, 4) # shape: (3, 4)

# Shorthand for transpose applicable only to 2D tensors
m_t = m.t() # shape: (4, 3)

# Use ‘permute’ to reorder all dims as required 
y = x.permute(2, 0, 1)    # shape: (4, 2, 3)

expand broadcasts a tensor to a larger size without copying data, while repeat does so by copying the values.

# Create a tensor
x = torch.randn(1, 3)

y = x.expand(4, 3) # # x’s single row is broadcasted 4 times
# shape: (4, 3)

#’repeat’ works by repeating each dimension by the given factor
y = x.repeat(4, 1) # x’s dim 0 is duplicated 4 times in memory, dim 1 is duplicated once
# shape: (4, 3)

stack combines tensors by creating a new dimension, while cat joins or concatenates tensors along an existing dimension.

# Create tensors
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Stack along a new dimension 0
out = torch.stack([a, b], dim=0)

print(out.shape)
# torch.Size([2, 3])

# Concatenate along dimension 0
out = torch.cat([a, b], dim=0)

print(out.shape)
# torch.Size([6])

5. Autograd

Autograd is PyTorch’s automatic differentiation engine. It is used to automatically compute gradients of tensors with respect to other tensors.

This is an important operation in training deep learning neural networks, which involves computing the gradients of a loss function with respect to the model parameters.

To track operations on a tensor for gradient computation, we set requires_grad=True.

PyTorch will track all operations on x as we set requires_grad=True in the example given below.

# Create a tensor that requires gradients
x = torch.tensor(2.0, requires_grad=True)

Operations on a tensor with requires_grad=True are recorded on a dynamic computational graph, which is a directed acyclic graph (DAG).

Shown in the example below, PyTorch builds a computational graph for the following function behind the scenes.

# Define a function (y = x^2 + 2x + 2)
y = x**2 + 2*x + 2

Keep reading with a 7-day free trial

Subscribe to Into AI to keep reading this post and get 7 days of free access to the full post archives.