overfit - acai pie

We say an (approximate) [[interpolate]]r **overfits** if it achieves high [[generalization error]] / doesn't [[abstract|general]]ize. (Used pretty heuristically, not a rigorous definition.) We'd typically expect an [[dimensions per sample|overparameterized]] model class to overfit often since it might capture spurious [[signal noise decomposition|noise]] in the training data and not the actual desired signal. But instead, in many cases, we observe [[double descent]] in which even more complex architectures actually generalize well (achieve low test error). Why? See there. We can also analyze the [[optimism]] and use the [[in-sample error]] as a less biased estimate of the [[generalization error]]. For many loss functions we can express ![[optimism#^optimism]] Otherwise -- if the prediction rule achieves large train loss -- we say the method *underfits*, in which case we might need to choose a more expressive model class, pick a more stable fitting method, use less regularization, etc.