Policy Gradient Methods
  • are a type of reinforcement learning technique that relies upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent

Introduction