In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.
After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.
If you want to support this channel, here is my patreon link:
[ Ссылка ] --- You are amazing!! ;)
If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: [ Ссылка ]
Links mentioned in the video:
⦁ PPO paper: [ Ссылка ]
⦁ TRPO paper: [ Ссылка ]
⦁ OpenAI PPO blogpost: [ Ссылка ]
⦁ Aurelien Geron: KL divergence and entropy in ML: [ Ссылка ]
⦁ Deep RL Bootcamp - Lecture 5: [ Ссылка ]
⦁ RL-adventure PyTorch implementation: [ Ссылка ]
⦁ OpenAI Baselines TensorFlow implementation: [ Ссылка ]
Ещё видео!