Transformer Encoder

Methods build a transformer encoder block

CONFIG_PATH = '../config.yml'
DATA_PATH = Path('../input')

Load parameters from the config file.

config = yaml.safe_load(open(CONFIG_PATH))

dset = datasets.CIFAR10(DATA_PATH, download=True)

Files already downloaded and verified

images, targets = dset.data, dset.targets
len(images), len(targets)

(50000, 50000)

Prepare a small batch of images to test the image processing.

images.shape

(50000, 32, 32, 3)

Sample a bunch of points and select those as indices of the image for training.

image_idx = np.random.randint(low=0, high=len(images), size=3)

# corresponding labels
targets = [targets[t] for t in image_idx]
targets

[8, 3, 5]

in_ch = config["patch"]["in_ch"]
out_ch = config["patch"]["out_ch"]

# size of each small patch
patch_size = config['patch']['size']
patch_size

images.shape[1:]

(32, 32, 3)

images = torch.Tensor(images[image_idx])
images = images/255.
images.shape

torch.Size([3, 32, 32, 3])

Increase image size to match with ViT paper \(224\times 224\)

hw = config['data']['hw']
augs = T.Resize(hw)
augs

Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=warn)

images = augs(images.permute(0, 3, 1, 2))
images.shape

/Users/gg/.local/share/virtualenvs/vit-pytorch-u3xJdwPd/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
  warnings.warn(

torch.Size([3, 3, 224, 224])

Make Embedded Patches

patch_embed = PatchEmbedding(config)(images)
patch_embed.shape

torch.Size([3, 197, 768])

Prepare Transformer Layer

Apply LayerNorm over the embedding dimension, which in our cases is \(768\).

seq_len = config['patch']['n']
embed_dim = config['patch']['out_ch']
seq_len, embed_dim

(196, 768)

x_ln = nn.LayerNorm(normalized_shape=embed_dim)(patch_embed)
x_ln.shape

torch.Size([3, 197, 768])

num_heads = config['encoder']['msa_heads']

attn_output, attn_output_weights = nn.MultiheadAttention(embed_dim=embed_dim, num_heads=num_heads)(x_ln, x_ln, x_ln)

attn_output.shape

torch.Size([3, 197, 768])

Prepare MSA block

source

MultiheadSelfAttn

 MultiheadSelfAttn (config)

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

x = MultiheadSelfAttn(config)(patch_embed)
x.shape

torch.Size([3, 197, 768])

Prepare MLP block