how to train a computer to imitate an *expert* on a [[sequential decision making]] task?
suppose expert collects [[trajectory dataset]].
what to do with it?
prototypical algorithm:
[[behaviour cloning]]: simple [[maximum likelihood estimation]]:
given state, predict action.
often suffers from [[transfer learning in reinforcement learning|overfitting in imitation learning]] (in theory and practice):
[[covariate shift]] from training data to learned policy's own [[trajectory distribution]]
possible insight from [[cognitive]] science:
key part missing from [[optimization|machine learning method]]s:
[[theory of mind|intention reading]],
eg for [[language acquisition]]? look more
# sources
https://gwern.net/doc/reinforcement-learning/model-free/2015-bagnell.pdf
https://www.youtube.com/@rail7462/playlists
#medium/video [berkeley lecture](https://www.youtube.com/watch?v=kGc8jOy5_zY) | [part 2](https://youtu.be/06uB13C5pxw)
![[library.base#imitation learning]]