= '../config.yml'
CONFIG_PATH = Path('../input') DATA_PATH
ViT model
Load parameters from the config file.
= yaml.safe_load(open(CONFIG_PATH)) config
= datasets.CIFAR10(DATA_PATH, download=True) dset
Files already downloaded and verified
= dset.data, dset.targets
images, targets len(images), len(targets)
(50000, 50000)
Prepare a small batch of images to test the image processing.
images.shape
(50000, 32, 32, 3)
Sample a bunch of points and select those as indices of the image for training.
= np.random.randint(low=0, high=len(images), size=3) image_idx
# corresponding labels
= [targets[t] for t in image_idx]
targets targets
[3, 6, 2]
= config["model"]["n_classes"]
n_classes n_classes
10
Putting together PatchEmbedding and TransformerEncoder
= torch.Tensor(images[image_idx])
images = images/255.
images = config['data']['hw']
hw = T.Resize(hw)
augs
= augs(images.permute(0, 3, 1, 2))
images images.shape
/Users/gg/.local/share/virtualenvs/vit-pytorch-u3xJdwPd/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
warnings.warn(
torch.Size([3, 3, 224, 224])
VisionTransformer
VisionTransformer (config)
Base class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes::
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 20, 5)
self.conv2 = nn.Conv2d(20, 20, 5)
def forward(self, x):
x = F.relu(self.conv1(x))
return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call :meth:to
, etc.
.. note:: As per the example above, an __init__()
call to the parent class must be made before assignment on the child.
:ivar training: Boolean represents whether this module is in training or evaluation mode. :vartype training: bool
= VisionTransformer(config) vit
= vit(images)
outs outs.shape
torch.Size([3, 10])
vit.embeddings_.shape
torch.Size([3, 196, 768])
vit.cls_tokens_.shape
torch.Size([3, 768])