Suppose we have some [[trajectory dataset]] (e.g. lots of previous users of a platform or video game). How to find an (approximately) [[optimal policy]] *without* access to the environment? see also [[imitation learning]] when the data comes from an expert; see [[transfer learning in reinforcement learning]] for [[covariate shift]] issue another way is to treat rl as a [[stochastic process|time series]] / [[self supervised reinforcement learning]] task # sources [[Sergey Levine]] does a lot of work on this [Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents - YouTube](https://youtu.be/D74DRVWLS7A)