the components of a [[reinforcement learning algorithm]] are highly intertwined. if you break one thing, everything breaks; not very [[symmetry|modular]]; [[Bayesian network|causal graph]] is not [[sparse]]; hard to [[debugging and errors|debug]]; see [[reinforcement learning debugging tips and implementation advice]] ![[2021JonesDebuggingReinforcementLearning]] ![[2016SchulmanNutsBoltsDeep]] writeup on [[tutorial on implementing muzero]] https://github.com/andyljones/reinforcement-learning-discord-wiki/wiki#debugging-advice [[model based]] ![[rl diagnostics]]