reinforcement learning

applying [[machine learning method]]s (mostly [[function approximation]] for [[approximate policy evaluation]] and [[policy learning]]) to the [[sequential decision making]] problem ie where [[model based|planning]] is hard or expensive. **principle of reinforcement** in [[psychology and epistemology]] / [[behaviourism]]: "behaviour that is rewarded will be repeated. behaviour that isn't rewarded won't be repeated." aka *operant or instrumental conditioning*. - analogous to [[natural selection]]; reward $\sim$ fitness - see [[1972 Rescorla-Wagner rule]] for [[artificial neural network]] model ![[associative learning#^operant]] also [[2021SilverEtAlRewardEnough|reward hypothesis]] and [[2021VamplewEtAlScalarRewardNot]] see [[reinforcement learning software]], [[challenges of reinforcement learning]] # sources - [[DeepMind]] has a LOT of open-source repositories available on GitHub - [Deep Reinforcement Learning - Google DeepMind](https://www.deepmind.com/blog/deep-reinforcement-learning) (blog post) - [Tweet on progress in RL for continuous control](https://twitter.com/araffin2/status/1575439865222660098) - a review of RL progress - [“What happened to Reinforcement Learning research and labs?” | x.com](https://twitter.com/hardmaru/status/1570742914602639360) - [GitHub - aikorea/awesome-rl: Reinforcement learning resources curated](https://github.com/aikorea/awesome-rl) - [GitHub - brianspiering/awesome-deep-rl: A curated list of awesome Deep Reinforcement Learning resources](https://github.com/brianspiering/awesome-deep-rl) - [GitHub - kengz/awesome-deep-rl: A curated list of awesome Deep Reinforcement Learning resources.](https://github.com/kengz/awesome-deep-rl) - [Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108 - YouTube](https://www.youtube.com/watch?v=kxi-_TT_-Nc) - [David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86 - YouTube](https://youtu.be/uPUEq8d73JI) - [[2023WhiddenTrainingAIPlay|Training AI to Play Pokemon with Reinforcement Learning]] - [1a3orn - LessWrong](https://www.lesswrong.com/users/1a3orn) (Has written lots of good explainer essays) - [[2018SuttonBartoReinforcementLearningIntroduction|Reinforcement learning]] (the canonical sutton and barto text) - [Teaching - David Silver](https://www.davidsilver.uk/teaching/) (UCL Course on RL) - [[Sergey Levine]] is on like every single [[offline reinforcement learning]] paper - [GitHub - mhd-medfa/IU-Reinforcement-Learning-22-lab: Innopolis University Master's students.](https://github.com/mhd-medfa/IU-Reinforcement-Learning-22-lab) - [DeepMind x UCL | Reinforcement Learning Course 2018](https://www.youtube.com/watch?v=ISk80iLhdfU&list=PLqYmG7hTraZBKeNJ-JE_eyJHZ7XgBoAyb) (YouTube lecture series) - [DeepMind x UCL | Introduction to Reinforcement Learning 2015](https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ) (YouTube lecture series) - [Welcome to the 🤗 Deep Reinforcement Learning Course - Hugging Face Deep RL Course](https://huggingface.co/learn/deep-rl-course) - [CS 542 Statistical Reinforcement Learning](https://nanjiang.cs.illinois.edu/cs542f21/) - [What is the relation between online (or offline) learning and on-policy (or off-policy) algorithms? - Artificial Intelligence Stack Exchange](https://ai.stackexchange.com/a/10491) - [[STAT 184]] - [[CS STAT 184 textbook]] - [[RL courses at Cornell]] - [[2025MurphyReinforcementLearningOverview|Murphy rl textbook]] - [[2019PowellReinforcementLearningOptimal]] - [[2022AgarwalEtAlReinforcementLearningTheory|AJKS]] on theory - https://youtu.be/RIkse0tJ0hE ([[Joseph Suarez]]) - [[2024AmiiRichSuttonsNew]] - https://www.reddit.com/r/reinforcementlearning/comments/1kx7vd7/why_arent_llms_trained_with_reinforcement/