for [[discrete random variable]]s, the joint distribution is described by a [[probability mass function]], exactly the same as the "scalar" case; consider [[encoding]] eg [[vectorize|flatten]]ing for [[real random variable]]s $X_{1:N}$: start with joint [[cumulative distribution function]] $ F(x_{1:N}) := \Pr(X_{1} \le x_{1}, \dots, X_{n} \le x_{n}) $ then take [[partial derivative]] to get [[probability density function]] $ f(x_{1:N}) = \partial_{1:N} F(x_{1:N}) $ (order of pds doesn't matter.) integrate over a region to get its probability mass; must therefore integrate to $1$. $f(x, y) = f(x) f(y \mid x)$ [[statistically independent]] # [[normalization]] Note that interpreting each element as an observation allows us to rewrite some properties in terms of concepts from [[vector space|linear algebra]]: $ \begin{align*} \frac{1}{N}\boldsymbol{1}^{\top} \boldsymbol{x} &= \overline{x} \\ \frac{1}{N} \|\boldsymbol{x}\|_{2}^{2} &= \mathop{\mathbb{E}}_{n} (x^{n})^{2} \\ &= \mathop{\mathrm{Var}}_{n}(x^{n}) + \overline{x}^{2} \\ \frac{1}{\sqrt{N}} \|\boldsymbol{x}\|_{2} &= \mathop{\mathrm{Std}}_{n}(x^{n}) \text{ where } \overline{x} = 0 \end{align*} $ i.e. if $x^{1:N}$ are iid samples from a distribution with mean $0$ and variance $\sigma^{2}$ then the [[convergence of random variables|lln]] tells us that $ \frac{1}{N} \|\boldsymbol{x}\|_{2}^{2} \to \sigma^{2} $ i.e. heuristically $\|\boldsymbol{x}\| \approx \sqrt{N}$ Careful with scaling! Often we have $\sigma^{2} \propto 1/N$ so that $\|\boldsymbol{x}\|_{2}^{2} \to O(1)$ (eg [[Gaussian orthogonal ensemble|goe]]). # sources [[STAT 110]] 7.1.1, 7.1.2, 7.1.13, 7.1.21