How large language models work, a visual intro to transformers | Chapter 5, Deep Learning