Policy Gradient Methods are a type of reinforcement learning technique that relies upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent Introduction