
The Markov Property, Chain, Reward Process and Decision Process
As seen in the previous article, we now know the general concept of Reinforcement Learning. But how do we actually get towards solving our third challenge: "Temporal Credit Assignment"?
To solve this, we first need to introduce a generalization of our reinforcement models. When we look at these