the components of a [[reinforcement learning algorithm]] are highly intertwined.
if you break one thing,
everything breaks;
not very [[symmetry|modular]];
[[Bayesian network|causal graph]] is not [[sparse]];
hard to [[debugging and errors|debug]];
see [[reinforcement learning debugging tips and implementation advice]]
![[2021JonesDebuggingReinforcementLearning]]
![[2016SchulmanNutsBoltsDeep]]
writeup on [[tutorial on implementing muzero]]
https://github.com/andyljones/reinforcement-learning-discord-wiki/wiki#debugging-advice
[[model based]]
![[rl diagnostics]]