Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural networks.
Patreon: [ Ссылка ]
Ko-fi: [ Ссылка ]
The transcript: [ Ссылка ]
Topics we discuss, and timestamps:
0:00:42 - What computational mechanics is
0:29:49 - Computational mechanics vs other approaches
0:36:16 - What world models are
0:48:41 - Fractals
0:57:43 - How the fractals are formed
1:09:55 - Scaling computational mechanics for transformers
1:21:52 - How Adam and Paul found computational mechanics
1:36:16 - Computational mechanics for AI safety
1:46:05 - Following Adam and Paul's research
Simplex AI Safety: [ Ссылка ]
Research we discuss:
Transformers represent belief state geometry in their residual stream: [ Ссылка ]
Transformers represent belief state geometry in their residual stream [LessWrong post]: [ Ссылка ]
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer: [ Ссылка ]
Ещё видео!