We say an (approximate) [[interpolate]]r **overfits**
if it achieves high [[generalization error]] / doesn't [[abstract|general]]ize.
(Used pretty heuristically, not a rigorous definition.)
We'd typically expect an [[dimensions per sample|overparameterized]] model class to overfit often since it might capture spurious [[signal noise decomposition|noise]] in the training data and not the actual desired signal.
But instead, in many cases, we observe [[double descent]]
in which even more complex architectures actually generalize well (achieve low test error).
Why? See there.
We can also analyze the [[optimism]] and use the [[in-sample error]] as a less biased estimate of the [[generalization error]].
For many loss functions we can express
![[optimism#^optimism]]
Otherwise -- if the prediction rule achieves large train loss -- we say the method *underfits*,
in which case we might need to choose a more expressive model class, pick a more stable fitting method, use less regularization, etc.