Consider some [[supervised|labelled]] [[dataset]] $\boldsymbol{X}, \boldsymbol{y}$.
We say a [[prediction rule]] $f : \mathcal{X} \to \mathcal{Y}$ *interpolates* the dataset if $f(\boldsymbol{x}_{n}) = y_{n}$ for all $n$
(ie zero [[train loss]] for [[squared error|L2 loss]] or [[zero-one loss]])
note for a [[linear prediction rule]] (possibly with [[basis expansion]])
this is equivalent to solving a [[system of linear equations]].
See [[polynomial interpolator]] for an important case.
In general:
When will a [[optimization|machine learning method]] find an interpolator?
Depends on [[machine inductive bias]]:
1. [[prediction rule|prediction rule class]] must be [[model complexity|expressive enough]] to contain the [[statistics|data generating process]] / ground truth (i.e. model complexity must pass the **interpolation threshold**)
2. the [[optimization algorithm|fitting method]] must actually find it
3. the [[regularization term]] must not be too large (otherwise we're minimizing a different loss function)
[[linear mode connectivity]] suggests that all interpolators lie on some [[connected]] [[manifold]]
![[overfit]]
# sources
[[2018HeathScientificComputing]]