Bellman equation plays a vital role in reinforcement learning.
Iterative Policy Evaluation
$$$$ $$ V_{k+1}(s) = \sum_{a, s'} \pi(a|s) p(s'|s,a) \{ r(s, a, s') + \gamma V_{k}(s') \}$$ $$$$\begin{flalign}
&V_{k}:\text{Value function after } k \text{ th iteration.}\
&
\end{flalign}
\begin{flalign}
& \pi : \text{Policy. The probability of performing action } a \text{ under state } s. \
&
\end{flalign}
\begin{flalign}
& p: \text{Probability of the next state } s' \text{ under state } s \text{ and action } a.
&
\end{flalign}
\begin{flalign}
& r: \text{Return under the state s, action a, and next state s'.}\
&
\end{flalign}
\( r \) : Return under the state \( s \), action \( a \), and next state \( s’ \).
By iterating this update, it has been proven to converge to the true value function \( v_{\pi}(s) \) under policy \( \pi \).
$$ E=mc^2 $$
$$ \frac{1}{2} \cdot (x + y) $$
$$ f(x) = x^2 $$
$$ x = 2 $$
$$\displaystyle \int_{-\infty }^{\infty}f(x)dx$$
$$ \lim_{n \to \infty} \frac{1}{n} = 0 $$
$$ V^*(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} P(s' | s, a) V^*(s') \right] $$
$$\begin{align}
&a=x+y+z\
&b+c=w
\end{align}$$
$$\begin{flalign}
&a=x+y+z\
&b+c=w
\end{flalign}$$
