Implementation of the Vision Transformer ([ Ссылка ]) using PyTorch from scratch (almost)!
I would recommend first studying the original Transformer paper () and looking at its implementation, since the Vision Transformer basically builds off of the original architecture but uses images instead of text, and then finally studying the Vision Transformer paper.
- Transformer explanation: [ Ссылка ]
- Transformer implementation:[ Ссылка ]
- Vision Transformer explanation: [ Ссылка ]
Here is my GitHub repo for this implementation:
- [ Ссылка ]
Please feel free to leave any feedback, corrections, or questions that you might have!
Outline:
0:00 - Introduction
3:24 - Imports and Hyperparameters
9:40 - Input Layer: Patchifying, Linear Projection, and Positional Encoding
25:36 - Encoder
36:30 - Vision Transformer
Ещё видео!