A data set of prepared data that is cleaned and has appropriate labels, if used for supervised learning, that is used to incrementally train a machine learning model to perform a particular task. Machine learning works by providing an algorithm with training data to generate a model that can then be used to predict future data.