Consider some [[supervised|labelled]] [[dataset]] $\boldsymbol{X}, \boldsymbol{y}$. We say a [[prediction rule]] $f : \mathcal{X} \to \mathcal{Y}$ *interpolates* the dataset if $f(\boldsymbol{x}_{n}) = y_{n}$ for all $n$ (ie zero [[train loss]] for [[squared error|L2 loss]] or [[zero-one loss]]) note for a [[linear prediction rule]] (possibly with [[basis expansion]]) this is equivalent to solving a [[system of linear equations]]. See [[polynomial interpolator]] for an important case. In general: When will a [[optimization|machine learning method]] find an interpolator? Depends on [[machine inductive bias]]: 1. [[prediction rule|prediction rule class]] must be [[model complexity|expressive enough]] to contain the [[statistics|data generating process]] / ground truth (i.e. model complexity must pass the **interpolation threshold**) 2. the [[optimization algorithm|fitting method]] must actually find it 3. the [[regularization term]] must not be too large (otherwise we're minimizing a different loss function) [[linear mode connectivity]] suggests that all interpolators lie on some [[connected]] [[manifold]] ![[overfit]] # sources [[2018HeathScientificComputing]]