February 11, 2024

The convergence of the iteration of optimal bellman equation

Bellman equation plays a vital role in reinforcement learning. Iterative Policy Evaluation $$$$ $$ V_{k+1}(s) = \sum_{a, s'} \pi(a|s) p(s'|s,a) \{ r(s, a, s') + \gamma V_{k}(s') \}$$ $$$$ \begin{flalign} &V_{k}:\text{Value function after } k \text{ th iteration.}\ & \end{flalign} \begin{flalign} & \pi : \text{Policy. The probability of performing action } a \text{ under state } s. \ & \end{flalign} \begin{flalign} & p: \text{Probability of the next state } s' \text{ under state } s \text{ and action } a....