Discover how Reinforcement Learning with Human Feedback (RLHF) is transforming the training of large language models! In this insightful talk, Luis Serrano, PhD, breaks down the fundamentals of reinforcement learning and explores cutting-edge techniques like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Learn how human feedback refines AI models to improve text generation, making this an essential watch for those interested in machine learning and NLP.
Luis Serrano, the author of Grokking Machine Learning and creator of Serrano Academy, brings his expertise in AI, having worked at Google, Apple, and Cohere. Don't miss this chance to dive into the world of AI with a true expert!
#AI #MachineLearning #ReinforcementLearning #NLP #DeepLearning #ArtificialIntelligence #DataScience #LanguageModels #TechTalk #MLCourses #LuisSerrano #SerranoAcademy #ODSC
Timecodes:
0:00 - Introduction
1:45 - Large Language Models(Transformers)
4:15 - How to fine-tune them with RLHF
15:50 - Quick intro to reinforcement learning
19:51 - PPO (reinforcement learning technique to fine-tune LLMs)
26:04 - DPO(non-reinforcement learning technique to fine-tune LLMs)
Want to stay ahead on AI? Then subscribe to ODSC today 🚀🧠- [ Ссылка ]
You can also follow ODSC on:
LinkedIn - [ Ссылка ]
Twitter - [ Ссылка ]
Facebook - [ Ссылка ]
Medium - [ Ссылка ]
Ещё видео!