Proximal Policy Optimization (PPO)
  • is an algorithm in the field of reinforcement learning that trains a computer agent’s decision function to accomplish difficult tasks
  • was developed by John Schulman in 2017
  • become the default reinforcement learning algorithm at American artificial intelligence company OpenAI
  • In 2018 PPO had received a wide variety of successes, such as controlling a robotic arm, beating professional players at Dota 2, and excelling in Atari games
  • is classified as a policy gradient method for training an agent’s policy network

Introduction