A Friendly PyTorch Guide for Beginners!

Gal Hever
6 min readMay 27, 2020

Introduction

In this blog-post we will go over on the basics of the PyTorch framework. I assume that you never worked with PyTorch before so we will start from a short intro and then we will dive further in for more detail on how it works, important features and components that PyTorch uses, common commands and packages.

What is PyTorch?

PyTorch is an open-source Python library for deep learning projects developed by Facebook’s AI Research team. It is similar to NumPy library but also exploits the power of GPUs and automatic computation of gradients.

Deep Learning Frameworks

In recent years, there has been a development of machine learning frameworks such as Caffe, CNTK, TensorFlow, and Theano that construct a static dataflow graph. Static graph improves performance and scalability, however, it comes at the cost of ease of use and debug. Some recent frameworks adopted the dynamic graph also known as “define-by-run” approach. This attitude comes with some drawbacks such as performance cost (Chainer) or training time cost (Torch, DyNet). PyTorch was developed in a goal to exploit the advantages of the dynamic graph and also to compensate on its drawbacks while trying to combine speed and usability into one package.

Why PyTorch?

PyTorch = Usability + Speed

  1. Dynamic tensor computations
  2. Automatic differentiation and GPU acceleration
  3. Maintaining performance
  4. Pythonic and easy to debug

Main Characters

  • Imperative Framework — A coding style that translates each line step by step and executes it. “On-the-fly” attitude that makes it easier to debug.
  • Dynamic Graph — “Defined-by-run” approach that constructs the computational graph during each run of the forward pass.

Tensors in PyTorch

What is a tensor?

A tensor is an n-dimensional data container that is equivalent to a NumPy array, but compared to NumPy it was designed to take advantage of the parallel computation capabilities of a GPU.

Similarities between PyTorch and NumPy

A PyTorch implementation is similar to a NumPy implementation, so if you have worked with NumPy before it will be really easy for you to start working with PyTorch. Let’s see how it works in practice.

Here are some useful commands that will show you the similarity between those two packages:

PyTorch — Basic Mathematical Operations

PyTorch — Basic Commands

PyTorch & NumPy Conversion

Sometimes, it is useful to convert Numpy ndarray to a PyTorch tensor and vice versa. To do so, you can use the command from_numpy() when converting from a NumPy ndarray to a PyTorch tensor. Conversely, use numpy() to convert back to a NumPy ndarray.

Let’s try it:

Create a numpy array with values = 1,2:

a = np.array((1,2))

Convert the numpy array to a torch tensor:

b = torch.from_numpy(a)

Convert the torch tensor to a numpy array:

b.numpy()

How to implement a Neural Network with PyTorch?

Computation Graphs — What does it mean?

Let’s start with a short example:

Let’s say that we have a neural network consisting of just 5 neurons. The variables, b,c and d are created as a result of mathematical operations, whereas variables a, w1, w2, w3 and w4 are initialized by the user itself. Since, they are not created by any mathematical operator, nodes corresponding to their creation are represented by their name. This is true for all the leaf nodes in the graph.

b = w1∗a

c = w2∗a

d = w3∗b+w4∗c

L = 10-d

Compute the gradients for each of the learnable parameters w:

We could manually compute the gradients of our network as it was very simple but imagine what if you had a network with 80 layers..it becomes more complicated.

The idea is to find a way that will smoothly calculate the gradients, regardless of the type or size of architecture when network changes are made. We can do this by using a graph structure where each node represents the mathematical operation and the leaves represent the variables that are initialized by the user.

Dynamic Graph

PyTorch creates a new graph when running the forward pass. In a creation of a new tensor, it is recommended to wrap it with a class called Variable, that any action that will be done on that tensor will be kept in memory.

If we look at the h_next function, we can see that it has a pointer to the previous function that implemented it, and the ADD function has a pointer to the input function that was inserted before and so on, thus creating a kind of tree of all the history of the calculation graph. We have pointers to all the calculations performed earlier and that is what enables the automatic calculation of the derivatives. Last, we will call the backward function that will calculate the gradients for each node considering the other nodes.

Autograd — An Automatic Differentiation

PyTorch tensors can remember where did they come from in terms of the operations and parent tensors that originated them, and they can provide the chain of derivatives of such operations with respect to their inputs. Autograd is a PyTorch component that performs an automatic differentiation and will be added by requires_grad=True to the tensor constructor as below:

params = torch.tensor([1.0, 0.0], requires_grad=True)

This argument tells PyTorch to track the entire family tree of the tensors resulting from operations on params. It will also make sure that the gradients are stored for this particular tensor whenever we perform an operation on it. This feature calculates the differentiation of parameters at the forward pass and makes the procedure of the chain rule to be much faster.

Let’s take a look at some example where all the gradients for the tensor variable will be stored for each operation that will be performed on it:

What requires_grad = True means in practice?

  • Tracks the entire family tree
  • Calculates the differentiation of parameters at the forward pass
  • Accelerates the training
  • Gradients for this tensor will be stored for each operation

Let’s initialize a new tensor:

And perform some operation on that tensor.

When we do some mathematical action on this sample matrix we will add 5 and get an average and we can see that grad_fn keeps track of all our action and keeps it in memory.

How are PyTorch’s Graphs Differ from TensorFlow Graphs?

Loss function and Optim Module

How to use a Loss Function in PyTorch?

First, let’s import torch package:

import torch.nn as nn

Now, choose the loss function that fits to your task, for example:

How to use an optimizer in PyTorch?

Import optim module from torch package:

from torch import optim

Now, choose the optimizer that fits to your task, for example:

Common Packages

  • Torch package is used to define tensors and mathematical operations.
  • TorchText is a library that contains the scripts for preprocessing text and source of few popular NLP datasets.

End Notes

Now you have enough knowledge on the basics of PyTorch. If you wish to continue learning PyTorch you can move on to my next blog-post that explains how to build a sentiment analysis model— Sentiment Analysis with PyTorch.

--

--