PyTorch Utilities

(This passage contains 0 words)

Table of Contents

In this piece of note, I will discuss a few utilities provided by PyTorch. The utilities I'm about to involve are those I consider as fundamental but not so easily grasped in the first glance. The program will be solely based on Python language, and a note on LibTorch library based on C++ might be written in the future.

The source of these programs are free for download. You may visit the official documentation sites of each utility via the links I provide in each section. A quick reference is appended to the back of this passage.

Optimizer

An optimizer is a PyTorch utility used for training models (torch.nn.Module) . The base class for optimizers is torch.optim.Optimizer, and the most basic usage of these classes follows the same approach in general, which can be expressed as the following code.


from models import Model 
from torch.optim import Optimizer 

class MyOptimizer(Optimizer):
    def __init__(self, params, defaults):
        # impl
    def step(self, closure):
        # impl
    # ...

model = Model()
optimizer = MyOptimizer(model.parameters(), { default_args })

def train_one_epoch():  # Implement one epoch
    model.train()
    y = model(x)  # let x be input 
    loss = criterion(y, truth)
    loss.backward()  # for gradient-based algorithms
    optimizer.step()  # update module params
    model.eval()
    # impl

The use of PyTorch-based optimizers involves three stages. The first stage is where you implement your own optimization algorithm that inherits from base class torch.optim.Optimizer. Two of the most important methods are __init__ and step, where step method is for performing one-step optimization, that is, a single step towards model's optimal condition taken in each iteration. The second stage is to instantiate your optimizer class and pass the correct parameters. It is extremely important to note that the parameters are generally passed by calling torch.nn.Module.parameters. That's why you need to make sure the parameters are correctly registered in your model using torch.nn.Parameters. In the final stage, when the loss has been computed, the backward propagation has been performed, we need to call the step method of the optimizer to perform the one-step progress. For gradient-free algorithms, step method can be called without backward propagation.

Now let us dive into the implementation of optimizers, i.e. derived classes of torch.optim.Optimizer, and finally we will be able to implement our own optimizer.

The Base Class

Before implementing our own optimizer, we need to first look into the base class of every PyTorch-based optimizer — torch.optim.Optimizer (which we will call the Optimizer later). We will first talk about some important concepts and variables, and discuss its structure and methods at last.

Parameter & Param Groups

Optimizers usually take a load of parameters to control the training process. The parameters can be divided into two categories in general: model parameters and hyperparameters. Model parameters are learnable parts of models like weights in linear layers and kernels of convolution blocks, while hyperparameters are parameters irrelevant to forward propagation of models. Typical hyperparameters in gradient-based learning involve learning rate (such as gradient descent and its variants), momentum (in momentum-based learning such as Adam) and so on. Some of the hyperparameters such as epoch number and batch number are not required in optimization algorithms so they will not be part of the optimizer.

To store the parameters efficiently, the Optimizer has a variable called param_groups. It is essentially a list of dictionaries where each dictionary stores an independent set of parameters. A trivial param_groups can be given as follows.


param_groups: List[Dict[str, Any]] = [
    {  # the first set of parameters
        'params': [model.weight],  # model weights (required)
        'lr': lr,  # learning rate (optional)
        'momentum': momentum,  # momentum (optional)
        # other hyperparameters (optional)
    },
    # the second, the third sets ...
]

Since each dictionary in the param_groups is an independent set of parameters, we can use different parameters when training different parts of the model. Such design allows a more flexible control over the whole training process.

Defaults

As we have said, the Optimizer provides param_groups to store a series of sets of parameters. But which set of parameters should be the default one when the whole training process relies on only a fixed set of parameters? This is where we specify defaults — the default set of hyperparameters. Clearly defaults is a dictionary, and the only difference between defaults and any dictionary within param_groups is that defaults does not include any model parameter since model parameters are managed by param_groups.

The following code is an example of defaults variables.


defaults = dict(lr=lr, momentum=momentum)

Initialization

The initialization of the Optimizer is generally a process of loading optimizer parameters, including both model parameters and hyperparameters. The base class accepts two arguments during initialization — params and defaults. params is an iterable object that contains tensors as model parameters, and in the case of passing torch.Module.parameters, params will be an iterator of torch.nn.Parameters. The second argument, defaults, is exactly the defaults we discussed in the previous section.

Quick Reference

PyTorch documentation on torch.optim.Optimizer.