Assumes all the [[dataset|data]] required for the algorithm is accessible from the beginning; we don't get to interact with the environment to collect more data. aka "full batch" see [[offline reinforcement learning]] for [[reinforcement learning]] problem