is an algorithm in the field of reinforcement learning that trains a computer agent’s decision function to accomplish difficult tasks
was developed by John Schulman in 2017
become the default reinforcement learning algorithm at American artificial intelligence company OpenAI
In 2018 PPO had received a wide variety of successes, such as controlling a robotic arm, beating professional players at Dota 2, and excelling in Atari games
is classified as a policy gradient method for training an agent’s policy network