I use "[[deep learning]]" as a synonym for "[[optimization]] over [[asymptotic|high dimensional]] [[domain]]s" Some guiding questions: - What makes a function easy/hard to approximate? How many samples do you need? - What [[represent|latent representation]]s do models learn? - For a given optimizer, can we predict what kinds of solutions it finds? - Can we explain [[2019SuttonBitterLesson|The Bitter Lesson]]: the unreasonable effectiveness of [[continuous optimization]] techniques on large models? - Can we understand the [[computational complexity theory|complexity]] of these [[algorithm]]s? Prove [[machine inductive bias|upper or lower bounds]]? (eg via [[statistical query]] framework) - Why does there exist polytime algorithms for certain problems and not others? # inductive bias Can we understand [[induct|inductive]] bias of deep learning methods? ![[induct#^inductive-bias]] Why do [[dimensions per sample|overparameterized]] models often **generalize well**? Equivalently: - Why does [[double descent]] happen? - Why does [[neural network inductive bias|neural network training prefer simple functions]]? - Or not: What causes models to fail outside of their training distribution? (See also [[covariate shift|inner alignment]]) or are eg [[foundation model]]s even [[dimensions per sample|overparameterized]]? see [[2025XiaoRethinkingConventionalWisdom]] # sources - [[2021GrosseCSC2541Winter2021]] strongly recommend! - [[2016GoodfellowEtAlDeepLearning]] - [ARENA](https://www.arena.education/) - [MIT Deep Learning 6.S191](http://introtodeeplearning.com/) - MIT intro program, series of lectures. probably quite good. [Lectures on YouTube across multiple years](https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI) - [[2023PrinceUnderstandingDeepLearning]] - [[2024TelgarskyCulturalScientificProblems]] - [[2014OlahNeuralNetworksManifolds]]