In 2018 OpenAI made a breakthrough in Deep Reinforcement Learning. This breakthrough was made possible thanks to a strong hardware architecture and by using the state of the art's algorithm: Proximal Policy Optimization.
The main idea of Proximal Policy Optimization is to avoid having too large a policy update. To do that, we use a ratio that tells us the difference between our new and old policy and clip this ratio from 0.8 to 1.2. Doing that will ensure that the policy update will not be too large.
In this tutorial, we'll dive into the understanding of the PPO architecture and we'll implement a Proximal Policy Optimization (PPO) agent that learns to play Pong-v0.
However, if you want to understand PPO, you need first to check all my previous tutorial. In this tutorial as a backbone, I will use A3C tutorial code.
Text version tutorial: [ Ссылка ]
Full video playlist: [ Ссылка ]
GitHub code: [ Ссылка ]
✅ Support My Channel Through Patreon:
[ Ссылка ]
✅ One-Time Contribution Through PayPal:
[ Ссылка ]
Ещё видео!