Value-Function Approximation
individual update notation:
- π β¦ π’
where:
- π - state updated
- π’ - update target
for example:
|
MC |
|
|
TD(0) |
|
|
n-step TD |
|
|
DP policy evaluation |
|
Prediction Objective (Mean Squared Value Error (VE))
denoted as:
where:
Stochastic Gradient Descent (SGD)
Stochastic gradient-descent (SGD) methods do this by adjusting the weight vector after each example by a small amount in the direction that would most reduce the error on that example:
Indent
Gradient of π wrt π€:
Indent
π£π(ππ‘) is usually noise corrupted, so ππ‘:
Gradient MC Algorithm for estimating π£Λ ~ π£π:
Input: the policy π to be evaluated
Input: a differentiable function vΛ : S x β^d -> β
Algorithm parameter: step size πΌ > 0
Initialize value-function weights wββ^d arbitrarily (e.g., w = 0)
Loop forever (for each episode):
Generate an episode S0, A0, R1, S1, A1, ..., RT, ST using π
Loop for each step of episode, t = 0, 1, ..., T1:
w = w + πΌ [G_t - vΛ(S_t,w)] π»vΛ(S_t,w)
Semi-Gradient TD(0) for estimating π£Λ ~ π£π:
use the following as target:
Input: the policy π to be evaluated
Input: a differentiable function vΛ : S x β^d -> β such that vΛ(terminal,Β·) = 0
Algorithm parameter: step size πΌ > 0
Initialize value-function weights wββ^d arbitrarily (e.g., w = 0)
Loop for each episode:
Initialize S
Loop until S is terminal:
Choose A ~ π(Β·|S)
Take action A, observe R, S'
w = w + πΌ [R + πΎvΛ(S',w) - vΛ(S,w)] π»vΛ(S,w)
S = S'
Linear Methods
given an x(s) real-valued vector
in the linear case SGD update simplifies to the form:
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/reinforcement-learning-(rl)/rl-chapters/rl-chapter-9-(on-policy-prediction-with-approximation)/3.png)
Feature Construction for Linear Methods
- polynomials
- fourier basis
- coarse coding - it sits conceptually between:
- tabular representation (each state has its own feature)
- global approximations like polynomial or Fourier bases
- tile coding - a form of course coding
- radial basis functions
coarse coding vs tile coding
- coarse coding - arbitrary overlapping shapes, varying sizes
- tile coding - grids of tiles, arranged in βtilings,β each covering space evenly
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/reinforcement-learning-(rl)/rl-chapters/rl-chapter-9-(on-policy-prediction-with-approximation)/1.png)
---cognitive-computing---machine-intelligence/ai---subfields/machine-learning-(ml)---pattern-recognition/ml---models/reinforcement-learning-(rl)/rl-chapters/rl-chapter-9-(on-policy-prediction-with-approximation)/2.png)