A reinforcement learning algorithm that learns what the value of the action is for a particular state and doesn’t require a model of its environment, thus enabling it to handle random transitions and rewards. Q-learning is known as an off-policy learning algorithm.