for [[discrete random variable]]s,
the joint distribution is described by a [[probability mass function]],
exactly the same as the "scalar" case;
consider [[encoding]] eg [[vectorize|flatten]]ing
for [[real random variable]]s $X_{1:N}$:
start with joint [[cumulative distribution function]]
$
F(x_{1:N}) := \Pr(X_{1} \le x_{1}, \dots, X_{n} \le x_{n})
$
then take [[partial derivative]] to get [[probability density function]]
$
f(x_{1:N}) = \partial_{1:N} F(x_{1:N})
$
(order of pds doesn't matter.)
integrate over a region to get its probability mass;
must therefore integrate to $1$.
$f(x, y) = f(x) f(y \mid x)$
[[statistically independent]]
# [[normalization]]
Note that interpreting each element as an observation allows us to rewrite some properties in terms of concepts from [[vector space|linear algebra]]:
$
\begin{align*}
\frac{1}{N}\boldsymbol{1}^{\top} \boldsymbol{x} &= \overline{x} \\
\frac{1}{N} \|\boldsymbol{x}\|_{2}^{2} &= \mathop{\mathbb{E}}_{n} (x^{n})^{2} \\
&= \mathop{\mathrm{Var}}_{n}(x^{n}) + \overline{x}^{2} \\
\frac{1}{\sqrt{N}} \|\boldsymbol{x}\|_{2} &= \mathop{\mathrm{Std}}_{n}(x^{n}) \text{ where } \overline{x} = 0
\end{align*}
$
i.e. if $x^{1:N}$ are iid samples from a distribution with mean $0$ and variance $\sigma^{2}$ then the [[convergence of random variables|lln]] tells us that
$
\frac{1}{N} \|\boldsymbol{x}\|_{2}^{2} \to \sigma^{2}
$
i.e. heuristically $\|\boldsymbol{x}\| \approx \sqrt{N}$
Careful with scaling! Often we have $\sigma^{2} \propto 1/N$ so that $\|\boldsymbol{x}\|_{2}^{2} \to O(1)$ (eg [[Gaussian orthogonal ensemble|goe]]).
# sources
[[STAT 110]] 7.1.1, 7.1.2, 7.1.13, 7.1.21