how to train a computer to imitate an *expert* on a [[sequential decision making]] task? suppose expert collects [[trajectory dataset]]. what to do with it? prototypical algorithm: [[behaviour cloning]]: simple [[maximum likelihood estimation]]: given state, predict action. often suffers from [[transfer learning in reinforcement learning|overfitting in imitation learning]] (in theory and practice): [[covariate shift]] from training data to learned policy's own [[trajectory distribution]] possible insight from [[cognitive]] science: key part missing from [[optimization|machine learning method]]s: [[theory of mind|intention reading]], eg for [[language acquisition]]? look more # sources https://gwern.net/doc/reinforcement-learning/model-free/2015-bagnell.pdf https://www.youtube.com/@rail7462/playlists #medium/video [berkeley lecture](https://www.youtube.com/watch?v=kGc8jOy5_zY) | [part 2](https://youtu.be/06uB13C5pxw) ![[library.base#imitation learning]]