Skip to content

Q-learning #
Find similar titles

A model-free Reinforcement learning technique. Given states \( S \) and actions \( A \), the payoff \( Q \) is iteratively updated. From wikipedia:

$$ Q_{t+1}(s_{t},a_{t}) = \underbrace{Q_t(s_t,a_t)}_{\rm old~value} + \underbrace{\alpha_t(s_t,a_t)}_{\rm learning~rate} \cdot \left( \overbrace{\underbrace{R_{t+1}}_{\rm reward} + \underbrace{\gamma}_{\rm discount~factor} \underbrace{\max_{a}Q_t(s_{t+1}, a)}_{\rm estimate~of~optimal~future~value}}^{\rm learned~value} - \underbrace{Q_t(s_t,a_t)}_{\rm old~value} \right) $$

Incoming Links #

Related Articles #

Suggested Pages #

0.0.1_20140628_0