the colab notebook:
[ Ссылка ]
the GitHub repo:
[ Ссылка ]
Support my learning journey on patreon!
[ Ссылка ]
Discuss this stuff with other Tunadorks on Discord
[ Ссылка ]
All my other links
[ Ссылка ]
Timestamps:
00:00 Intro
03:29 Setup
05:25 First Residual State
06:42 Precomputing RoPE Frequencies
11:13 Precomputing Causal Mask
12:58 RMSNorm
17:37 Initializing Multi-Query Attention
22:00 Rotary Positional Embeddings
24:22 Calculating Self-Attention
30:42 Residual Connection
32:25 SwiGLU Feedforward Network
36:15 Repeated Layers
36:48 Output Layer
39:00 Cross-Entropy Loss
40:25 Actually Functional Model Code
50:30 Train Your Own
54:20 Load a Pre-trained minLlama Model
55:00 Inference
59:00 Outro
Ещё видео!