ViT model

Putting together patch embeddings and transformer encoder
CONFIG_PATH = '../config.yml'
DATA_PATH = Path('../input')

Load parameters from the config file.

config = yaml.safe_load(open(CONFIG_PATH))
dset = datasets.CIFAR10(DATA_PATH, download=True)
Files already downloaded and verified
images, targets = dset.data, dset.targets
len(images), len(targets)
(50000, 50000)

Prepare a small batch of images to test the image processing.

images.shape
(50000, 32, 32, 3)

Sample a bunch of points and select those as indices of the image for training.

image_idx = np.random.randint(low=0, high=len(images), size=3)
# corresponding labels
targets = [targets[t] for t in image_idx]
targets
[3, 6, 2]
n_classes = config["model"]["n_classes"]
n_classes
10

Putting together PatchEmbedding and TransformerEncoder

images = torch.Tensor(images[image_idx])
images = images/255.
hw = config['data']['hw']
augs = T.Resize(hw)

images = augs(images.permute(0, 3, 1, 2))
images.shape
/Users/gg/.local/share/virtualenvs/vit-pytorch-u3xJdwPd/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
  warnings.warn(
torch.Size([3, 3, 224, 224])

source

VisionTransformer

 VisionTransformer (config)

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to, etc.

.. note:: As per the example above, an __init__() call to the parent class must be made before assignment on the child.

:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool

vit = VisionTransformer(config)
outs = vit(images)
outs.shape
torch.Size([3, 10])
vit.embeddings_.shape
torch.Size([3, 196, 768])
vit.cls_tokens_.shape
torch.Size([3, 768])