Since there is often a lot of training data to use for model training in an epoch, and that data might not fit in memory or be able to all be processed at the same time required to compute the gradients, training can be done in groups calls batches, with the gradients then calculated and aggregated across those batches.