applying [[machine learning method]]s (mostly [[function approximation]] for [[approximate policy evaluation]] and [[policy learning]])
to the [[sequential decision making]] problem
ie where [[model based|planning]] is hard or expensive.
**principle of reinforcement** in [[psychology and epistemology]] / [[behaviourism]]:
"behaviour that is rewarded will be repeated. behaviour that isn't rewarded won't be repeated."
aka *operant or instrumental conditioning*.
- analogous to [[natural selection]]; reward $\sim$ fitness
- see [[1972 Rescorla-Wagner rule]] for [[artificial neural network]] model
![[associative learning#^operant]]
also [[2021SilverEtAlRewardEnough|reward hypothesis]] and [[2021VamplewEtAlScalarRewardNot]]
see [[reinforcement learning software]], [[challenges of reinforcement learning]]
# sources
- [[DeepMind]] has a LOT of open-source repositories available on GitHub
- [Deep Reinforcement Learning - Google DeepMind](https://www.deepmind.com/blog/deep-reinforcement-learning) (blog post)
- [Tweet on progress in RL for continuous control](https://twitter.com/araffin2/status/1575439865222660098)
- a review of RL progress
- [“What happened to Reinforcement Learning research and labs?” | x.com](https://twitter.com/hardmaru/status/1570742914602639360)
- [GitHub - aikorea/awesome-rl: Reinforcement learning resources curated](https://github.com/aikorea/awesome-rl)
- [GitHub - brianspiering/awesome-deep-rl: A curated list of awesome Deep Reinforcement Learning resources](https://github.com/brianspiering/awesome-deep-rl)
- [GitHub - kengz/awesome-deep-rl: A curated list of awesome Deep Reinforcement Learning resources.](https://github.com/kengz/awesome-deep-rl)
- [Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108 - YouTube](https://www.youtube.com/watch?v=kxi-_TT_-Nc)
- [David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86 - YouTube](https://youtu.be/uPUEq8d73JI)
- [[2023WhiddenTrainingAIPlay|Training AI to Play Pokemon with Reinforcement Learning]]
- [1a3orn - LessWrong](https://www.lesswrong.com/users/1a3orn) (Has written lots of good explainer essays)
- [[2018SuttonBartoReinforcementLearningIntroduction|Reinforcement learning]] (the canonical sutton and barto text)
- [Teaching - David Silver](https://www.davidsilver.uk/teaching/) (UCL Course on RL)
- [[Sergey Levine]] is on like every single [[offline reinforcement learning]] paper
- [GitHub - mhd-medfa/IU-Reinforcement-Learning-22-lab: Innopolis University Master's students.](https://github.com/mhd-medfa/IU-Reinforcement-Learning-22-lab)
- [DeepMind x UCL | Reinforcement Learning Course 2018](https://www.youtube.com/watch?v=ISk80iLhdfU&list=PLqYmG7hTraZBKeNJ-JE_eyJHZ7XgBoAyb) (YouTube lecture series)
- [DeepMind x UCL | Introduction to Reinforcement Learning 2015](https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ) (YouTube lecture series)
- [Welcome to the 🤗 Deep Reinforcement Learning Course - Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course)
- [CS 542 Statistical Reinforcement Learning](https://nanjiang.cs.illinois.edu/cs542f21/)
- [What is the relation between online (or offline) learning and on-policy (or off-policy) algorithms? - Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/a/10491)
- [[STAT 184]]
- [[CS STAT 184 textbook]]
- [[RL courses at Cornell]]
- [[2025MurphyReinforcementLearningOverview|Murphy rl textbook]]
- [[2019PowellReinforcementLearningOptimal]]
- [[2022AgarwalEtAlReinforcementLearningTheory|AJKS]] on theory
- https://youtu.be/RIkse0tJ0hE ([[Joseph Suarez]])
- [[2024AmiiRichSuttonsNew]]
- https://www.reddit.com/r/reinforcementlearning/comments/1kx7vd7/why_arent_llms_trained_with_reinforcement/