An **optimization problem** is the task of minimizing some **objective function** $J(\boldsymbol{x}) \in \mathbb{R}$ over some [[domain]] $\mathcal{X}$ subject to **inequality constraints** $\boldsymbol{g}(\boldsymbol{x}) \in \mathbb{R}^{N}$ and **equality constraints** $\boldsymbol{h}(\boldsymbol{x}) \in \mathbb{R}^{M}$: ^problem $ \min_{\boldsymbol{x} \in \mathcal{X}} J(\boldsymbol{x}) \quad \text{subject to} \quad \boldsymbol{g}(\boldsymbol{x}) \le \boldsymbol{0} \quad \text{and} \quad \boldsymbol{h}(\boldsymbol{x}) = \boldsymbol{0} $ ^program We solve an optimization problem using some [[algorithm]]; see [[optimization algorithm]]. ("[[closed form solution]]" is really a subjective term.) Note that equality constraints are technically *redundant* since we could express $\boldsymbol{h}(\boldsymbol{x}) = 0$ by the two inequality constriants $\boldsymbol{h}(\boldsymbol{x}) \le 0$ and $- \boldsymbol{h}(\boldsymbol{x}) \le 0$. But it's usually more illuminating to treat them separately. # terminology We call $\boldsymbol{x}$ **feasible** if it obeys the constraints. We call $\boldsymbol{x}$ **strictly** feasible if $\boldsymbol{g}(\boldsymbol{x}) < \boldsymbol{0}$ and $\boldsymbol{h}(\boldsymbol{x}) = \boldsymbol{0}$. We call $\boldsymbol{x}^{\star}$ a **global minimum** if $J(\boldsymbol{x}^{\star}) \le J(\boldsymbol{x})$ for all feasible $\boldsymbol{x}$. We call $\widetilde{\boldsymbol{x}}$ a **local minimum** if $J(\widetilde{\boldsymbol{x}}) \le J(\boldsymbol{x})$ for all feasible $\boldsymbol{x}$ in some [[neighbourhood]] of $\widetilde{\boldsymbol{x}}$. We say $g^{n}$ is **binding** at $\boldsymbol{x}$ if $g^{n}(\boldsymbol{x}) = 0$. We say $\boldsymbol{\delta} \in \mathcal{X}$ is a **direction of decrease** at $\boldsymbol{x}$ if moving in that direction decreases $J$; ie the [[directional derivative]] $\partial J(\boldsymbol{x}) \boldsymbol{\delta} < 0$; ie $\boldsymbol{\delta}$ points away from the gradient $\nabla J(\boldsymbol{x})$. See [[loss landscape]]. [[Lagrangian duality]] is a powerful tool to at least get lower bounds on the solution and solve [[convex]] problems. # [[machine learning]] perspective ![[machine learning#^goal]] "machine learning method" is a synonym for "[[optimization]] problem and algorithm". How to frame the optimization problem and choose an algorithm? We must consider *inductive biases*: ![[induct#^inductive-bias]] Then we also need to conduct [[model assessment]] to estimate the [[generalization error]]. # [[archetype|model organism]]s [[two-layer perceptron|2LP]] [[Rosenbrock function]]