Sentiment Analysis with Pytorch — Part 2 — Linear Model

7 min readApr 7, 2020

Introduction

This post is the second part of the serie Sentiment Analysis with Pytorch. In the previous part we took a look of how to preprocess the data with TorchText before entering it into the model. In this blog-post we will focus on modeling and training a simple Linear model with Pytorch.

If you wish to continue to the next parts in the serie:

Sentiment Analysis with Pytorch — Part 3 — CNN Model

Sentiment Analysis with Pytorch — Part 4 — LSTM\BiLSTM Model

Sentiment Analysis with Pytorch — Part 5— MLP Model

Building a Linear Model

The Linear model that we will build will contain a single fully-connected layer with 100 units and without any activation function. This model does not include an embedding layer but in the next models we will see how we can add it as well.

First, let’s define the hyper-parameters for the Linear model:

lr = 1e-4 
batch_size = 50
dropout_keep_prob = 0.5
embedding_size = 300
max_document_length = 100  # each sentence is limited to 100 words
dev_size = 0.8 # split percentage for train\validation 
max_size = 5000 # maximum vocabulary size
seed = 1
num_classes = 3
num_epochs = 10
hidden_size = 100

Linear Class

Now, we need to import the torch.nn package and use it to write the Linear Class:

import torch.nn as nn class Linear(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(Linear, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size, bias=True)
        self.fc2 = nn.Linear(hidden_size, num_classes, bias=True)

    def forward(self, text, text_lengths):
        text = text.float() # dense layer deals with float datatype
        x = self.fc1(text)
        preds = self.fc2(x)
        return preds

We will start with defining a new class of object type Linear. This class will inherent from Module class, which is the base model for all the models in torch package, i.e. every model must be a subclass of the Module class.

Now, we will add to this Class two functions: __init__ and forward.

1. Init: The constructor function. When a new instance of a Linear Class is created, init function is in charge of initializing the new object. In this case we will define all the layers that will be used in the model.

2. Forward: Forward method defines the forward pass of the inputs.

Let’s focus on each row in the code above. __init__ function inherited attributes from Module by using super. It means that we take all the attributes from the Module class and add more attributes (in our case the layers) to our subclass — Linear.

super(Linear, self).__init__()

We have two more attributes that we have added to the class - two Linear layers. Linear layer is also known as a fully-connected or a dense layer, in which all the neurons connect to all the neurons in the next layer.

self.fc1 = nn.Linear(in_features = input_size, out_features = hidden_size, bias=True)self.fc2 = nn.Linear(in_features = hidden_size, out_features = num_classes, bias=True)

We need to define in this step the in_features and out_features for each one of the layers. The input_size in our case will be the max_document_length, which refers to the sequence’s length in the batch.

Note: In our case we set Fix_length (in the previous blog-post) to all the batches and this is a less effective method than the dynamic padding.

Loss Function

Now, we need to define the loss function. In our case we have a multiclass problem with 3 classes, for that case we will need to use CrossEntropyLoss loss function.

loss_func = nn.CrossEntropyLoss()

If you deal with binary labels you can use BCELoss. You can read about all the loss functions here and fit the best one for your problem.

Note: The nn.CrossEntropyLoss() function applies a log softmax followed by a negative log likelihood loss operation over the output of the model. nn.Softmax(dim=1) in latest layer is unnecessary because the nn.CrossEntropyLoss() already includes the softmax function in it.

Train the model

The next function uses TorchText to create an iterator for grouping the batches over the dataset. We explained this step in details in the previous tutorial.

def create_iterator(train_data, valid_data, test_data, batch_size, device):    train_iterator, valid_iterator, test_iterator =     data.BucketIterator.splits((train_data, valid_data, test_data),
        batch_size = batch_size,
        sort_key = lambda x: len(x.text), 
        sort_within_batch = True,
        device = device)
return train_iterator, valid_iterator, test_iterator

We defined also an accuracy function for multi-class problem:

def accuracy(probs, target):
  predictions = probs.argmax(dim=1)
  corrects = (predictions == target)
  accuracy = corrects.sum().float() / float(target.size(0))
  return accuracy

Next, we will create two functions: train and evaluate to train our model. Let’s start with the train function:

def train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    for batch in iterator:
        optimizer.zero_grad()
        text, text_lengths = batch.text
        predictions = model(text, text_lengths)
        loss = criterion(predictions, batch.labels.squeeze())
        acc = accuracy(predictions, batch.labels)        
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

We will iterate over the training batch, while in each new batch we will initialize the gradients to zero. Our batch object contains the sentence (batch.text) as an object and the targets too(batch.labels). The text object contains two sub attributes — text and text_length. text_length will be used in case when you will want to pad the sequences in each batch dynamically (Note that in our case we didn’t use this feature but it’s better to do so).

The criterion in this case is the CrossEntropyLoss that will compute the loss (how far is the output from being correct). The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction. When we perform the row loss = criterion(predictions, batch.labels.squeeze()), a new graph is created during the forward pass.

Squeeze will return the tensor without 1 dimension (it is possible to mark in which dimension exactly using dim argument). Then we will propagate gradients back into the network’s parameters by loss.backward().

loss.backward() function uses the graph to compute the gradients dloss/dx for every parameter x which has requires_grad=True. These are stored into x.grad for every parameter x. In pseudo-code:

x.grad += dloss/dx

Note: The tensors have already thegrad and therequires_grad attributes that were inherited automatically from nn.Module class during the forward pass each time you construct a new object of the Linear class.

optimizer.step() holds the current state, iterates over all parameters and updates the value of x based on their internally stored grad x.grad.

For example, the SGD optimizer performs:

x += -lr * x.grad

Note: It’s important to call optimizer.zero_grad() before loss.backward(), to clear x.grad for every parameter x in the optimizer. Otherwise, you’ll accumulate the gradients from multiple passes.

Evaluate the model

There are some layers (e.g. dropout and batch_norm) that work differently during training and evaluation steps. By model.train() or model.eval() we can set the model to a different state.

def evaluate(model, iterator, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.eval()
    with torch.no_grad():
        for batch in iterator:
            text, text_lengths = batch.text
            predictions = model(text, text_lengths).squeeze(1)
            loss = criterion(predictions, batch.labels)
            acc = accuracy(predictions, batch.labels)
            epoch_loss += loss.item()
            epoch_acc += acc.item()
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

Find the best model

Now, we will use the previous functions to find the best model:

def run_train(epochs, model, train_iterator, valid_iterator, optimizer, criterion, model_type):
    best_valid_loss = float('inf')

    for epoch in range(epochs):

        # train the model
        train_loss, train_acc = train(model, train_iterator,    optimizer, criterion)

        # evaluate the model
        valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)

        # save the best model
        if valid_loss < best_valid_loss:
            best_valid_loss = valid_loss
            torch.save(model.state_dict(), 'saved_weights'+'_'+model_type+'.pt')

        print(f'\tTrain Loss: {train_loss:.3f} | Train Acc: {train_acc * 100:.2f}%')
        print(f'\t Val. Loss: {valid_loss:.3f} |  Val. Acc: {valid_acc * 100:.2f}%')

Main Function

This is our complete main function. By torch.device we can place the tensors on the GPU if one is available.

Before training we will need to set two variables:

data_type : Set to “token” or “morph” depends on which data you want to train the model.
char_based : Set to FALSE to train on words and TRUE for chars.

import torch
import osif __name__ == "__main__":

   device = torch.device('cuda' if torch.cuda.is_available() else   'cpu')
    path = 'C:/Users/PycharmProjects/Sentiment_Analyzer'
    path_data = os.path.join(path, "data")    model_type = "Linear"
    data_type = "token" # or: "morph"
    char_based = True    if char_based:
        tokenizer = lambda s: list(s) # char-based
    else:
        tokenizer = lambda s: s.split() # word-based

    # hyper-parameters:
    lr = 1e-4
    batch_size = 50
    dropout_keep_prob = 0.5
    embedding_size = 300
    max_document_length = 100  # each sentence has until 100 words
    dev_size = 0.8 # split percentage to train\validation data
    max_size = 5000 # maximum vocabulary size
    seed = 1
    num_classes = 3
    num_epochs = 10
    hidden_size = 100    Text.build_vocab(train_data, max_size=max_size)
    Label.build_vocab(train_data)
    vocab_size = len(Text.vocab)

    train_iterator, valid_iterator, test_iterator = create_iterator(train_data, valid_data, test_data, batch_size, device)

    loss_func = nn.CrossEntropyLoss()    linear_model = Linear(max_document_length, hidden_size,    num_classes) 
        optimizer = torch.optim.Adam(linear_model.parameters(), lr=lr)    run_train(num_epochs, linear_model, train_iterator,  valid_iterator, optimizer, loss_func, model_type)

Test the model

You can load the weights by load_state_dict function and test the model using evaluate method as described in the code below.

# load weights
linear_model.load_state_dict(torch.load(os.path.join(path, "saved_weights_Linear.pt")))

test_loss, test_acc = evaluate(linear_model, test_iterator, loss_func)
print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc * 100:.2f}%')

End Notes

In this section we learned how to build a simple Linear model with Pytorch. In the next parts we will go over more complicated models. If you wish to continue to the next part you can press here: Sentiment Analysis with Pytorch — Part 3— CNN Model.

You can also find the full code for this tutorial on Github.

References

[https://www.aclweb.org/anthology/C18-1190.pdf]

Sentiment Analysis with Pytorch — Part 1 — Data Preprocessing