Markov Decision Process

RL is based off the Markov Decision Process, which is a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. The Markov Decision Process & RL consist of the following components:

  1. Environment – the space in which the RL model operates.

  2. State – has all the information about the environment and past steps that are relevant to the future.

  3. Action – what the agent does.

  4. Reward – positive or negative reward based on the bots actions. Goal for the agent is to optimize the reward over the long term.

  5. Observation – information about the state of the environment that is available to the agent at every step.


It’s a model of predicting outcomes. The diagram above shows the model attempts to predict an outcome given only information provided by the current state.  It also incorporates the actions and motivations of the agent. At each step during the process, the agent may choose to take an action available in the current state, resulting in the model moving to the next step and offering a reward.