Assumes all the [[dataset|data]] required for the algorithm is accessible from the beginning;
we don't get to interact with the environment to collect more data.
aka "full batch"
see [[offline reinforcement learning]] for [[reinforcement learning]] problem