／var／log marcus chiu

❯

❯

Artificial Intelligence (AI) - Cognitive Computing - Machine Intelligence

❯

❯

Machine Learning (ML) - Pattern Recognition

❯

❯

Reinforcement Learning (RL)

Q-Function

Created on Aug 24, 2024

Q-Function

captures the expected total future reward an agent in state, 𝑠, can receive by executing a certain action, 𝑎
- 𝑄(𝑠_𝑡,𝑎_𝑡) = 𝐄[𝑅_𝑡|𝑠_𝑡,𝑎_𝑡]
where:
- 𝑅_𝑡 - is the total reward, the discounted sum of all rewards obtained from time 𝑡, defined as:
  - 𝑅_𝑡 = 𝑟_𝑡 + 𝛾𝑟_𝑡+1 + 𝛾²𝑟_𝑡+2 + …
- 𝑠_𝑡 - state
- 𝑎_𝑡 - action

How to Act Given Q-Function

The agent needs a policy 𝜋(𝑠) to infer the best action to take given state 𝑠.

Given 𝑄(𝑠,𝑎) the policy 𝜋*(𝑠) is implemented as:

$π^{*} (s) = ar g max_{a} Q (s, a)$