Speaker: James Zou - Assistant Professor of Biomedical Data Science and of Computer Science & Electrical Engineering, Stanford University
In this session, Stanford Associate Professor James Zou discusses the dynamic changes in ChatGPT's behavior over time, highlighting substantial alterations in its ability to follow instructions and improvements in safety measures. He also explains the complexities and trade-offs involved in continuously updating these LLMs, emphasizing the need for balancing instruction adherence with enhanced safety protocols.
#AIForward
00:00 Introductions
01:03 Understanding ChatGPT's Behavior Changes
01:24 Motivation for Studying Large Language Model Behavior Changes
03:38 Methodology for Studying Behavior Drifts
05:00 Findings: Substantial Changes in ChatGPT's Behavior
07:28 Case Study: Math Problems and Subjective Questions
12:55 Case Study: Sensitive Questions and Code Generation
17:46 Hypothesis: Safety Training Impact on Instruction Following
19:15 Q&A
20:09 Experiments with Open Source Models
24:42 Conclusion: Need for Continuous Monitoring of AI Behavior
Ещё видео!