A model-free Reinforcement learning technique. Given states \( S \) and actions \( A \), the payoff \( Q \) is iteratively updated. From wikipedia:
$$ Q_{t+1}(s_{t},a_{t}) = \underbrace{Q_t(s_t,a_t)}_{\rm old~value} + \underbrace{\alpha_t(s_t,a_t)}_{\rm learning~rate} \cdot \left( \overbrace{\underbrace{R_{t+1}}_{\rm reward} + \underbrace{\gamma}_{\rm discount~factor} \underbrace{\max_{a}Q_t(s_{t+1}, a)}_{\rm estimate~of~optimal~future~value}}^{\rm learned~value} - \underbrace{Q_t(s_t,a_t)}_{\rm old~value} \right) $$
Incoming Links #
Related Articles (Article 0) #
Suggested Pages #
- 0.025 Meta learning
- 0.025 Modeling animal behaviors with reinforcement learning
- 0.006 Deep generative models
- 0.006 Deep learning and graphs
- 0.006 Evolution strategy
- 0.006 GraphLab
- 0.006 Yann LeCun
- 0.006 Keras
- 0.006 Andrew Ng
- 0.006 Quoc Le
- More suggestions...