I use "[[deep learning]]" as a synonym for "[[optimization]] over [[asymptotic|high dimensional]] [[domain]]s"
Some guiding questions:
- What makes a function easy/hard to approximate? How many samples do you need?
- What [[represent|latent representation]]s do models learn?
- For a given optimizer, can we predict what kinds of solutions it finds?
- Can we explain [[2019SuttonBitterLesson|The Bitter Lesson]]:
the unreasonable effectiveness of [[continuous optimization]] techniques on large models?
- Can we understand the [[computational complexity theory|complexity]] of these [[algorithm]]s?
Prove [[machine inductive bias|upper or lower bounds]]? (eg via [[statistical query]] framework)
- Why does there exist polytime algorithms for certain problems and not others?
# inductive bias
Can we understand [[induct|inductive]] bias of deep learning methods?
![[induct#^inductive-bias]]
Why do [[dimensions per sample|overparameterized]] models often **generalize well**? Equivalently:
- Why does [[double descent]] happen?
- Why does [[neural network inductive bias|neural network training prefer simple functions]]?
- Or not: What causes models to fail outside of their training distribution?
(See also [[covariate shift|inner alignment]])
or are eg [[foundation model]]s even [[dimensions per sample|overparameterized]]?
see [[2025XiaoRethinkingConventionalWisdom]]
# sources
- [[2021GrosseCSC2541Winter2021]] strongly recommend!
- [[2016GoodfellowEtAlDeepLearning]]
- [ARENA](https://www.arena.education/)
- [MIT Deep Learning 6.S191](http://introtodeeplearning.com/)
- MIT intro program, series of lectures. probably quite good. [Lectures on YouTube across multiple years](https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI)
- [[2023PrinceUnderstandingDeepLearning]]
- [[2024TelgarskyCulturalScientificProblems]]
- [[2014OlahNeuralNetworksManifolds]]