Owain Evans is an AI Alignment researcher, research associate at the Center of Human Compatible AI at UC Berkeley, and now leading a new AI safety research group.
In this episode we discuss two of his recent papers, “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs" ([ Ссылка ]) and “Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data” ([ Ссылка ]) alongside some Twitter questions.
Patreon: [ Ссылка ]
Manifund: [ Ссылка ]
Ask questions: [ Ссылка ]
Owain Evans: [ Ссылка ]
OUTLINE
00:00:00 Intro
00:01:12 Owain's Agenda
00:02:25 Defining Situational Awareness
00:03:30 Safety Motivation
00:04:58 Why Release A Dataset
00:06:17 Risks From Releasing It
00:10:03 Claude 3 on the Longform Task
00:14:57 Needle in a Haystack
00:19:23 Situating Prompt
00:23:08 Deceptive Alignment Precursor
00:30:12 Distribution Over Two Random Words
00:34:36 Discontinuing a 01 sequence
00:40:20 GPT-4 Base On the Longform Task
00:46:44 Human-AI Data in GPT-4's Pretraining
00:49:25 Are Longform Task Questions Unusual
00:51:48 When Will Situational Awareness Saturate
00:53:36 Safety And Governance Implications Of Saturation
00:56:17 Evaluation Implications Of Saturation
00:57:40 Follow-up Work On The Situational Awarenss Dataset
01:00:04 Would Removing Chain-Of-Thought Work?
01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper
01:05:15 Experimental Setup
01:07:46 Concrete Function Example: 3x + 1
01:11:23 Isn't It Just A Simple Mapping?
01:17:20 Safety Motivation
01:22:40 Out-Of-Context Reasoning Results Were Surprising
01:24:51 The Biased Coin Task
01:27:00 Will Out-Of-Context Resaoning Scale
01:32:50 Checking If In-Context Learning Work
01:34:33 Mixture-Of-Functions
01:38:24 Infering New Architectures From ArXiv
01:43:52 Twitter Questions
01:44:27 How Does Owain Come Up With Ideas?
01:49:44 How Did Owain's Background Influence His Research Style And Taste?
01:52:06 Should AI Alignment Researchers Aim For Publication?
01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?
01:58:52 Could Owain's Research Accelerate Capabilities?
02:08:44 How Was Owain's Work Received?
02:13:23 Last Message
Ещё видео!