Sentiment Analysis with Pytorch — Part 5— MLP Model

4 min readApr 8, 2020

Introduction

This post is the fifth part of the serie — Sentiment Analysis with Pytorch. In the previous part we built LSTM\BiLSTM models. In this blog-post we will focus on a Multi-layer perceptron (MLP) architecture with Pytorch.

What is MLP Model?

The Multi-layer perceptron (MLP) is a network that is composed of many perceptrons. Perceptron is a single neuron and a row of neurons is called a layer. MLP network consists of three or more fully-connected layers (input, output and one or more hidden layers) with nonlinearly-activating nodes. We can increase the number of the hidden layers as much as we want, to make the model more complex according to our task.

Building a MLP Model

Let’s code!

First, let’s define the hyper-parameters for the MLP model:

lr = 1e-4
batch_size = 50
dropout_keep_prob = 0.5
embedding_size = 300
max_document_length = 100  # each sentence has until 100 words
dev_size = 0.8 # split percentage to train\validation data
max_size = 5000 # maximum vocabulary size
seed = 1
num_classes = 3
hidden_size1 = 256
hidden_size2 = 128
hidden_size3 = 64
num_epochs = 6

MLP Class

The MLP model that we will build in this tutorial contains 3 fully-connected feed-forward layers, the first with 256 units, the second with 128 units and the third with 64 units. We apply dropout with rate of 0.5 on each fully-connected layer.

Constructor

We will define all of the attributes of the MLP class in __init__ , and then we will define the forward pass by forward function. In the previous post we explained in detail the general structure of the classes and the attribute inheritance from nn.Module , in this post we will focus on the MLP structure specifically.

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, vocab_size, embed_size, hidden_size2, hidden_size3, hidden_size4, output_dim, dropout, max_document_length):
        super().__init__()
        # embedding and convolution layers
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(dropout)
        self.fc1 = nn.Linear(embed_size*max_document_length, hidden_size2)  # dense layer
        self.fc2 = nn.Linear(hidden_size2, hidden_size3)  # dense layer
        self.fc3 = nn.Linear(hidden_size3, hidden_size4)  # dense layer
        self.fc4 = nn.Linear(hidden_size4, output_dim)  # dense layer

    def forward(self, text, text_lengths):
        # text shape = (batch_size, num_sequences)
        embedded = self.embedding(text)
        # embedded = [batch size, sent_len, emb dim]
        x = embedded.view(embedded.shape[0], -1)  # x = Flatten()(x)
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.relu(self.fc3(x))
        x = self.dropout(x)
        preds = self.fc4(x)
        return preds

Fully-Connected Layer

nn.Linear is also called a fully-connected layer or a dense layer, in which all the neurons connect to all the neurons in the next layer. We need to define in each layer the input for in_features variable and the out_features as an output (the number of neurons in the hidden layers can be tuned and in the last layer the output will be equal to the number of classes).

Dropout Layer

The dropout layer randomly dropping out units in the network. Since we chose a rate of 0.5, 50% of the neurons will receive a zero weight. This operation controls the regularization process and helps in preventing over-fitting. nn.Dropout will not change the dimensions of the original input.

Training, Evaluation and Test

The training, evaluation and test are exactly the same in all of the models. In the previous post we explained it in detail so you can read more about it here.

Main Function

import torch
import os

if __name__ == "__main__":
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    path = 'C:/Users/Gal/PycharmProjects/Sentiment_Analyzer'
    path_data = os.path.join(path, "data")

    model_type = "MLP"
    data_type = "token" # or: "morph"

    char_based = True
    if char_based:
        tokenizer = lambda s: list(s) # char-based
    else:
        tokenizer = lambda s: s.split() # word-based

    lr = 1e-4
    batch_size = 50
    dropout_keep_prob = 0.5
    embedding_size = 300
    max_document_length = 100  # each sentence has until 100 words
    dev_size = 0.8 # split percentage to train\validation data
    max_size = 5000 # maximum vocabulary size
    seed = 1
    num_classes = 3
    hidden_size1 = 256
    hidden_size2 = 128
    hidden_size3 = 64
    num_epochs = 6
    train_data, valid_data, test_data, Text, Label = get_files(path_data, dev_size, max_document_length, seed, data_type, tokenizer)

    Text.build_vocab(train_data, max_size=max_size)
    Label.build_vocab(train_data)
    vocab_size = len(Text.vocab)

    train_iterator, valid_iterator, test_iterator = create_iterator(train_data, valid_data, test_data, batch_size, device)

    loss_func = nn.CrossEntropyLoss()
    mlp_model = MLP(vocab_size, embedding_size, hidden_size1, hidden_size2, hidden_size3,  num_classes, dropout_keep_prob, max_document_length)
    optimizer = torch.optim.Adam(mlp_model.parameters(), lr=lr)
    run_train(num_epochs, mlp_model, train_iterator, valid_iterator, optimizer, loss_func, model_type)
    mlp_model.load_state_dict(torch.load(os.path.join(path, "saved_weights_MLP.pt")))

    test_loss, test_acc = evaluate(mlp_model, test_iterator, loss_func)
    print(f'Test Loss: {test_loss:.3f} | Test Acc: {test_acc * 100:.2f}%')