Value Function

Action-Value Function

Optimal Value Function

Optimal Action-Value Function

Policy Evalution

  • TODO

Finding New Greedy Policy

Policy Iteration

Value Iteration

  • policy evaluation is stopped after just one sweep (one update of each state).

Asynchronous Dynamic Programming

are in-place iterative DP algorithms that do not sweep through entire state set. examples include:

  • update the value of ONLY one state at each value iteration update

Generalized Policy Iteration (GPI)

policy iteration:

  • policy evaluation (PE)
  • policy improvement (PI)

GPI refer to the general of letting PE and PI interact

GPI is the family that consist of value iteration and asynchronous dynamic programming