[[1991KramerNonlinearPrincipalComponent|autoencoder]] is a pair of deterministic mappings $\text{enc}:x \mapsto z$ and $\text{dec}:z \mapsto \hat{x}$.
not a [[generative model]];
[[pullback and pushforward|pushforward]] of $p_{*}$ could be arbitrary.

*Image from [Machine Learning at Berkeley Blog](https://ml.berkeley.edu/blog/posts/vq-vae/)*
in [[variational autoencoder]]:
now $\text{enc} : \mathcal{X} \to \triangle(\mathcal{Z})$ and $\text{dec} : \mathcal{Z} \to \triangle(\mathcal{X})$,
and we [[regularization term]] so that $\text{enc}[x]$ becomes similar to some fixed prior $\pi \in \triangle(\mathcal{Z})$.
now we do [[variational inference]]:
$\text{enc}[x] \rightsquigarrow p_{*}[x]$.
we want to support all of $p_{*}[x]$ (seek the mean)
so we optimize the [[forward vs reverse kld|reverse kl]]
$\text{kl}(\text{enc}[x] \parallel p_{*}[x])$.
recall the [[evidence lower bound|elbo]]
![[evidence lower bound#^identity-prior]]
substituting $q =\text{enc}[x]$ and $p(z, x) = \pi(z)\text{dec}[z](x)$:
first term maximizes reconstruction likelihood (typically estimate with one sample):
second term makes $\text{enc}[x]$ similar to $\pi$
sometimes assume $\text{range} \,\text{enc}$ is the subset consisting of product distributions:
aka [[mean field]]
https://colab.research.google.com/drive/1v0UiRwUiBi4IoZKXXnwZwHpVDsUIoeg0?usp=sharing
# sources
Wikipedia pages for [Evidence lower bound](https://en.wikipedia.org/wiki/Evidence_lower_bound), [Variational Bayesian methods](https://en.wikipedia.org/wiki/Variational_Bayesian_methods), [Variational autoencoders](https://en.wikipedia.org/wiki/Variational_autoencoder)
[Eric Jang: A Beginner's Guide to Variational Methods: Mean-Field Approximation](https://blog.evjang.com/2016/08/variational-bayes.html)
- Helpful visualization of forward vs reverse KL divergence.
[From Autoencoder to Beta-VAE | Lil'Log](https://lilianweng.github.io/posts/2018-08-12-vae/)
- Goes further into depth on related architectures including denoising / sparse / contractive autoencoders and later research including $\beta$-VAEs, vector quantized VAEs, and temporal difference VAEs.
[Princeton lecture notes from Professor David Blei](https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf) ^princeton-notes
- Very in-depth and focuses on the optimization algorithms, which I've waved away in this post under the umbrella of "gradient ascent".
- Walks through a concrete example of a simple distribution whose posterior is hard to calculate: a mixture of Gaussians where the centroids are drawn from a Gaussian.
- Describes an improvement when $\mathcal{F}_{Z}$ is such that the distribution of each element, conditional on the others and on $x$, belongs to an exponential family.
[Tutorial - What is a variational autoencoder? – Jaan Altosaar](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/)
- Great tutorial that draws the distinction between the deep learning perspective and the probabilistic model perspective
- [GitHub - altosaar/variational-autoencoder: Variational autoencoder implemented in tensorflow and pytorch (including inverse autoregressive flow)](https://github.com/altosaar/variational-autoencoder)
[Melody Mixer by Torin Blankensmith, Kyle Phillips - Experiments with Google](https://experiments.withgoogle.com/melody-mixer)
[Speech Interaction Technology at Aalto University / NSVQ · GitLab](https://gitlab.com/speech-interaction-technology-aalto-university/nsvq)
[Improving Variational Inference with Inverse Autoregressive Flow](https://arxiv.org/pdf/1606.04934.pdf)