credit assignment

in a [[sequential decision making]] task where actions have **long-term effects**, it becomes especially hard to tell which actions [[causal|caused]] which consequences. this is one of the major [[challenges of reinforcement learning]] and makes [[model free]] [[approximate policy evaluation]] in many [[Markov decision process|mdp]] tasks very difficult and sample-inefficient. also see [[positional encoding and language model context length]]. how to address? eg [[1997HochreiterSchmidhuberLongShorttermMemory|LSTM]]s as an improved [[recurrent neural network]] and [[2017VaswaniEtAlAttentionAllYou|Transformer]] long-range attention mechanisms see [[state space model]]s for a new approach in [[human inductive bias]]: we assume future reward is related to recent context (Gershman & Niv, 2015) # sources https://en.wikipedia.org/wiki/Hysteresis