Project page (with further readings): [ Ссылка ]
Abstract: We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.
Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.
Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.
This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.
Timecodes
0:00 - Prelude
11:37 - Part 3: Knowledge
14:49 - Part 3.1: Knowledge Storage and Extraction
25:42 - Summary of Part 3.1
26:46 - Part 3.2: Knowledge Manipulation
35:19 - Summary of Part 3.2
37:00 - Part 3.3: Knowledge Capacity Scaling Laws
49:54 - Summary of Part 3.3
51:26 - Summary of Part 3
53:28 - Part 2.1: Grade-School Math and the Hidden Reasoning Process
1:18:57 - Summary of Part 2.1
1:20:37 - Part 2.2: How to Learn From Mistakes on Grade-School Math Problems
1:31:23 - Summary of Part 2.2
1:32:10 - Summary of Part 2
1:33:22 - Part 1: Hierarchical Language Structures
1:49:23 - Summary of Part 1
Ещё видео!