in a [[sequential decision making]] task where actions have **long-term effects**,
it becomes especially hard to tell which actions [[causal|caused]] which consequences.
this is one of the major [[challenges of reinforcement learning]] and makes [[model free]] [[approximate policy evaluation]] in many [[Markov decision process|mdp]] tasks very difficult and sample-inefficient.
also see [[positional encoding and language model context length]].
how to address?
eg [[1997HochreiterSchmidhuberLongShorttermMemory|LSTM]]s as an improved [[recurrent neural network]]
and [[2017VaswaniEtAlAttentionAllYou|Transformer]] long-range attention mechanisms
see [[state space model]]s for a new approach
in [[human inductive bias]]: we assume future reward is related to recent context (Gershman & Niv, 2015)
# sources
https://en.wikipedia.org/wiki/Hysteresis